Release Date: Dec. 23, 2016
- For instance, f-strings were introduced in Python 3.6 and won’t work in older versions of Python. Similarly, assignment expressions only became available in Python 3.8. Development versions: The Python community is continuously working on new versions of Python. At the time of this writing, Python 3.9 was under development.
- Upgraded to Python 3.6. Fixed a lots of build issues. Last time we’ve created Docker container with Jupiter, Keras, Tensorflow, Pandas, Sklearn and Matplotlib. Suddenly, I understood, that I’ve missed OpenCV for Docker image and video manipulations. Well, I spent whole day preparing new image build.
- Therefore, the Docker image resulting from the process is simply a read-only stack of different layers. We can also observe in the output of the build command the Dockerfile instructions being executed as steps. $ docker build -t myimage. Sending build context to Docker daemon 6.144kB Step 1/6: FROM python:3.8 3.8.3-alpine: Pulling from.
Using Alpine can make Python Docker builds 50× slower. When you’re choosing a base image for your Docker image, Alpine Linux is often recommended. Using Alpine, you’re told, will make your images smaller and speed up your builds. And if you’re using Go that’s reasonable advice. But if you’re using Python, Alpine Linux will quite.
Python 3.6.0 was the initial feature release of Python 3.6.
There are now newer security-fix releases of Python 3.6 that supersede 3.6.0 and Python 3.8 is now the latest feature release of Python 3. Get the latest releases of 3.6.x and 3.8.x here. Python 3.6.8 is planned to be the last bugfix releasefor 3.6.x. Following the release of 3.6.8, we plan to provide security fixes for Python 3.6 as needed through 2021, five years following its initial release.
Among the new major new features in Python 3.6 were:
- PEP 468, Preserving Keyword Argument Order
- PEP 487, Simpler customization of class creation
- PEP 495, Local Time Disambiguation
- PEP 498, Literal String Formatting
- PEP 506, Adding A Secrets Module To The Standard Library
- PEP 509, Add a private version to dict
- PEP 515, Underscores in Numeric Literals
- PEP 519, Adding a file system path protocol
- PEP 520, Preserving Class Attribute Definition Order
- PEP 523, Adding a frame evaluation API to CPython
- PEP 524, Make os.urandom() blocking on Linux (during system startup)
- PEP 525, Asynchronous Generators (provisional)
- PEP 526, Syntax for Variable Annotations (provisional)
- PEP 528, Change Windows console encoding to UTF-8
- PEP 529, Change Windows filesystem encoding to UTF-8
- PEP 530, Asynchronous Comprehensions
- Report bugs at https://bugs.python.org.
- Help fund Python and its community.
Notes on this release
- If you are building Python from source, beware that the OpenSSL 1.1.0c release, the most recent as of this update, is known to cause Python 3.6 test suite failures and its use should be avoided without additional patches. It is expected that the next release of the OpenSSL 1.1.0 series will fix these problems. See http://bugs.python.org/issue28689 for more information.
- Windows users: The binaries for AMD64 will also work on processors that implement the Intel 64 architecture. (Also known as the 'x64' architecture, and formerly known as both 'EM64T' and 'x86-64'.) They will not work on Intel Itanium Processors (formerly 'IA-64').
- Windows users: If installing Python 3.6.0 as a non-privileged user, you may need to escalate to administrator privileges to install an update to your C runtime libraries.
- Windows users: There are now 'web-based' installers for Windows platforms; the installer will download the needed software components at installation time.
- Windows Users: There are redistributable zip files containing the Windows builds, making it easy to redistribute Python as part of another software package. Please see the documentation regarding Embedded Distribution for more information.
- macOS users: If you are using the Python 3.6 from the python.org binary installer linked on this page, please carefully read the Important Information displayed during installation; this information is also available after installation by clicking on /Applications/Python 3.6/ReadMe.rtf. There is important information there about changes in the 3.6.0 installer-supplied Python, particularly with regard to SSL certificate validation.
- macOS users: There is important information about IDLE, Tkinter, and Tcl/Tk on macOS here.
|Version||Operating System||Description||MD5 Sum||File Size||GPG|
|Gzipped source tarball||Source release||3f7062ccf8be76491884d0e47ac8b251||22256403||SIG|
|XZ compressed source tarball||Source release||82b143ebbf4514d7e05876bed7a6b1f5||16805836||SIG|
|Mac OS X 64-bit/32-bit installer||macOS||for Mac OS X 10.6 and later||72acb0175e7622dec7e1b160a43b8c42||27442222||SIG|
|Windows help file||Windows||6a842a15ab3b4aa316c91a9779db82ec||7940890||SIG|
|Windows x86-64 embeddable zip file||Windows||for AMD64/EM64T/x64||0ec0caeea75bae5d2771cf619917c71f||6925798||SIG|
|Windows x86-64 executable installer||Windows||for AMD64/EM64T/x64||71c9d30c1110abf7f80a428970ab8ec2||31505640||SIG|
|Windows x86-64 web-based installer||Windows||for AMD64/EM64T/x64||25b8b6c93a098dfade3b014630f9508e||1312376||SIG|
|Windows x86 embeddable zip file||Windows||1adf2fb735c5000af32d42c39136727c||6315855||SIG|
|Windows x86 executable installer||Windows||38d9b036b25725f6acb553d4aece4db4||30566536||SIG|
|Windows x86 web-based installer||Windows||f71f4590be2cc5cdc43069594d4ea98d||1286984||SIG|
Artificial Intelligence(AI) and Machine Learning(ML) are literally on fire these days. Powering a wide spectrum of use-cases ranging from self-driving cars to drug discovery and to God knows what. AI and ML have a bright and thriving future ahead of them.
On the other hand, Docker revolutionized the computing world through the introduction of ephemeral lightweight containers. Containers basically package all the software required to run inside an image(a bunch of read-only layers) with a COW(Copy On Write) layer to persist the data.
Enough talk let’s get started with building a Python data science container.
Python Data Science Packages
Our Python data science container makes use of the following super cool python packages:
- NumPy: NumPy or Numeric Python supports large, multi-dimensional arrays and matrices. It provides fast precompiled functions for mathematical and numerical routines. In addition, NumPy optimizes Python programming with powerful data structures for efficient computation of multi-dimensional arrays and matrices.
- SciPy: SciPy provides useful functions for regression, minimization, Fourier-transformation, and many more. Based on NumPy, SciPy extends its capabilities. SciPy’s main data structure is again a multidimensional array, implemented by Numpy. The package contains tools that help with solving linear algebra, probability theory, integral calculus, and many more tasks.
- Pandas: Pandas offer versatile and powerful tools for manipulating data structures and performing extensive data analysis. It works well with incomplete, unstructured, and unordered real-world data — and comes with tools for shaping, aggregating, analyzing, and visualizing datasets.
- SciKit-Learn: Scikit-learn is a Python module integrating a wide range of state-of-the-art machine learning algorithms for medium-scale supervised and unsupervised problems. It is one of the best-known machine-learning libraries for python. The Scikit-learn package focuses on bringing machine learning to non-specialists using a general-purpose high-level language. The primary emphasis is upon ease of use, performance, documentation, and API consistency. With minimal dependencies and easy distribution under the simplified BSD license, SciKit-Learn is widely used in academic and commercial settings. Scikit-learn exposes a concise and consistent interface to the common machine learning algorithms, making it simple to bring ML into production systems.
- Matplotlib: Matplotlib is a Python 2D plotting library, capable of producing publication quality figures in a wide variety of hardcopy formats and interactive environments across platforms. Matplotlib can be used in Python scripts, the Python and IPython shell, the Jupyter notebook, web application servers, and four graphical user interface toolkits.
- NLTK: NLTK is a leading platform for building Python programs to work with human language data. It provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning.
Building the Data Science Container
Python is fast becoming the go-to language for data scientists and for this reason we are going to use Python as the language of choice for building our data science container.
The Base Alpine Linux Image
Alpine Linux is a tiny Linux distribution designed for power users who appreciate security, simplicity and resource efficiency.
As claimed by Alpine:
Small. Simple. Secure. Alpine Linux is a security-oriented, lightweight Linux distribution based on musl libc and busybox.
The Alpine image is surprisingly tiny with a size of no more than 8MB for containers. With minimal packages installed to reduce the attack surface on the underlying container. This makes Alpine an image of choice for our data science container.
Downloading and Running an Alpine Linux container is as simple as:
In our, Dockerfile we can simply use the Alpine base image as:
Talk is cheap let’s build the Dockerfile
Now let’s work our way through the Dockerfile.
FROM directive is used to set
alpine:latest as the base image. Using the
WORKDIR directive we set the
/var/www as the working directory for our container. The
ENV PACKAGES lists the software packages required for our container like
libgfortran. The python packages for our data science container are defined in the
We have combined all the commands under a single Dockerfile
RUN directive to reduce the number of layers which in turn helps in reducing the resultant image size.
Building and tagging the image
Now that we have our Dockerfile defined, navigate to the folder with the Dockerfile using the terminal and build the image using the following command:
-t flag is used to name a tag in the 'name:tag' format. The
-f tag is used to define the name of the Dockerfile (Default is 'PATH/Dockerfile').
Running the container
We have successfully built and tagged the docker image, now we can run the container using the following command:
Voila, we are greeted by the sight of a python shell ready to perform all kinds of cool data science stuff.
Our container comes with Python 2.7, but don’t be sad if you wanna work with Python 3.6. Lo, behold the Dockerfile for Python 3.6:
Build and tag the image like so:
Run the container like so:
With this, you have a ready to use container for doing all kinds of cool data science stuff.
Figures, you have the time and resources to set up all this stuff. In case you don’t, you can pull the existing images that I have already built and pushed to Docker’s registry Docker Hub using:
After pulling the images you can use the image or extend the same in your Dockerfile file or use it as an image in your docker-compose or stack file.
The world of AI, ML is getting pretty exciting these days and will continue to become even more exciting. Big players are investing heavily in these domains. About time you start to harness the power of data, who knows it might lead to something wonderful.
You can check out the code here.
Docker image for python datascience container with NumPy, SciPy, Scikit-learn, Matplotlib, nltk, pandas packages…github.com
I hope this article helped in building containers for your data science projects. Clap if it increased your knowledge, help it reach more people.