talks/2018-pyconfr-emergence.md

6.6 KiB

The emergence of consensus in Python


Julien Palard

PyCon Fr 2018


There should be one

-- and preferably only one --

obvious way to do it. (Tim Peters)

The emergence of consensus in Python

This is a study about undocumented consensus in the Python community, so you don't have to do it.

Julien Palard

  • Python documentation translator
  • Teaching Python at
    • Sup'Internet
    • CRI-Paris
    • Makina Corpus
  • julien@python.org, @sizeof, https://mdk.fr
  • Yes I write Python sometimes too…

Julien Palard

Digression

In one year, we went from 25.7% translated to 30% translated!

While japanese translation is at 79.2% and korean 14.6%.

Notes: Want news about the translation?

Thanks to Christophe, Antoine, Glyg, HS-157, and 27 other translators!

PLZ HELP

What did I do?

Crawled pypi.org to get some Python projects, cloned their github repo (around 4k repositories at the time of writing).

Then... played with the data ^^.

But why?

To answer all those questions a human or a search engine won't be able to answer.

Notes:

  • For my README, should I use rst or md?
  • unittest, nose, or pytest?
  • setup.py, requirements.txt, Pipfile, ...?
  • ...

Is it data science?

Hell no! It's biased, I only crawled projects published on pypi.org AND hosted on github.com, so I'm hitting a very specific subset of the population.

Note:

I mean

  • me: Hey consensus is to use the MIT license!
  • you: You crawled only open source projects...
  • me: Oh wait...

Digression

I used Jupyter, if you still don't have tried it, please take a look at it.

Meta-Digression

If you're using Jupyter Notebook, and never tried Jupyter Lab, please try it.

JupyterLab will replace Jupyter notebooks, so maybe start using it.

pip install jupyterlab
jupyter-lab

Notes:

I know you like digressions, so I'm putting digressions in my digression so I can digress while I digress.

Meta-Digression

10 years of data

I do not have enough data past this so graphs tends to get messy.

stats = raw_data.loc['2008-01-01':,:].resample('6M')

Notes: While Python is 28 years old (older than git, than github, even than Java).

Digression (again)

I used Pandas, if you never tried it...

Notes: It's a matrix of scatter plots.

README files

readmes = (stats['file:README.rst',
                 'file:README.md',
                 'file:README',
                 'file:README.txt'].mean().plot())

README files

Notes:

Consensus

10 years ago, people used README and README.txt.

It changed around 2011, now we use README.md and README.rst files.

Markdown won. I bet for its simplicity, readability, and also people may know it from elsewhere.

Consensus

But pypi.python.org don't support Markdown!

Yes, but...

Consensus

pypi.org does!

long_description_content_type='text/markdown'

See:

https://pypi.org/project/markdown-description-example/

So use README.md files!

Requirements

setups = stats['file:setup.cfg',
               'file:setup.py',
               'file:requirements.txt',
               'file:Pipfile',
               'file:Pipfile.lock'].mean().plot()

Requirements

Notes:

Nothing really interesting here :( We see the rise of Pipfile, but still can't say much about it...

Requirements

For dependency managment I've seen a lot of philosophies. and it really depends on "are you packaging", "is it an app or a library", …

Digression

The future

PEP 517 and PEP 518

[build-system]
requires = ["flit"]
build-backend = "flit.api:main"

Notes:

are introducing a way to completly remove setuptools and distutils from being a requirement, it make them a choice:

Tests

tests = (raw_data.groupby('test_engine')
         .resample('Y')['test_engine']
         .size()
         .unstack()
         .T
         .fillna(0)
         .apply(lambda line: 100 * line / float(line.sum()), axis=1)
         .plot())

Tests

Notes:

Sorry nose.

Documentation directory

docs = stats['dir:doc/', 'dir:docs/'].mean().plot()

Documentation directory

Note: Some of you are not documenting at all!

Concensus emmerged around 2011 towards docs/ instead of doc/, let's stick to it (please, please, no Docs/, I see you, cpython).

src/ or not src/

src = pd.DataFrame(stats['dir:src/'].mean()).plot()

src/ or not src/

Notes:

This one was slow, but the concensus is to drop the use of a src/ directory.

I used it a lot, convinced it would allow me to spot earlier an import bug ("." being in PYTHONPATH but not "src/"). But that's way overkill for a small win.

tests/ or test/?


has_tests = stats['dir:tests/', 'dir:test/', ].mean().plot()

tests/ or test/?

Note: First thing I see... Not everyone is writing tests.

I'm glad the concensus is as for docs/ and doc/, plural clearly wins. I bet it's semantically better, as the folder does not contain a test, but multiple tests.

pyproject.toml: to declare dependencies of your setup.py

Shebangs

shebangs = (raw_data.loc['2008-01-01':,raw_data.columns
    .map(lambda col: col.startswith('shebang:'))].sum())
top_shebangs = shebangs.sort_values().tail(4).index
shebangs_plot = (raw_data.loc['2008-01-01':, top_shebangs]
    .fillna(value=0).resample('6M').mean().plot())

Shebangs

Notes:

I'm glad there's not so much #!/usr/bin/env python2.7 here.

I'm not sure it's a good idea to specify a version in the shebang, but...

Licenses

top_licenses = raw_data.groupby('license').size().sort_values().tail(10)
licenses = (raw_data.groupby('license')
            .resample('Y')['license']
            .size()
            .unstack()
            .T
            .fillna(0)
            .loc[:, list(top_licenses.index)]
            .apply(lambda line: 100 * line / float(line.sum()), axis=1)
            .plot())

Licenses

Digression

https://choosealicense.com/

Questions?