Python meets microservice – useful tips based on experience
It is impossible nowadays to not hear about microservices. It’s so “buzzy” word that everyone is talking, writing and thinking about it – either developers and managers. In this blogpost I’d like to focus on some useful cases concerning Python & Microservices altogether.
At the beginning it’s worth to say, why Python gets along well with microservices.
- easy to start, fast prototyping provides working API quickly and easily
- great microframeworks ready to use, like Flask
- asynchronous services with Tornado or Twisted
- a lot of useful packages: requests, uritemplate, rfc3339, flask-restless
- clients to popular services like RabbitMQ, Redis, MongoDB
- Python’s advantages like sameness of JSON (most popular REST data medium) with Python’s dicts
Synchronous WSGI – help yourself with asynchrony of Gevent or asyncio
WSGI (Web Server Gateway Interface) standard is the implementation of PEP3333, which was inspired by CGI (Common Gateway Interface). It is worth to mention about a few key features, that comes with WSGI and are great with creating microservices. Starting with flexibility, which allows us to change the backend server without changing the application’s code (e.g.: switching from Nginx to Apache), through scalability handled by WSGI server itself (which allows us to add instances if the load of requests is huge). The other, no less important, are reusable middleware components for dealing with caches, sessions, authorizations etc.
Despite the above short introduction of WSGI, that sounds encouragingly to be convinced, microservices are about sending a request, and getting the response. The time indispensable for getting the data back may sometimes be a bottle neck to synchronous frameworks.
That’s why, mentioned in the introduction, Tornado and Twisted increase their popularity, as based on “callbacks”, handle asynchrony pretty well. I am not saying, that callbacks are the remedy. It is important to highlight, that is may help, in some cases. Nevertheless, there’s nothing wrong with implementing microservice based on WSGI standard, as far as app works in manner: 1 request = 1 thread. If still synchronous service is a deal breaker there is a trick to speed up our app by using gevent, or (which is going to be reminded at the end of this chapter) asyncio might be the answer.
Before I’ll proceed to some code samples, I’d like to inform all of you (one more time), that concurrency code it’s not always the answer. I’d go for the bold statement, that it should be used, if other solutions have failed. Why? Because it complicates the code, makes it harder to debug. Not saying about synchronization between concurrent fragments of code, that uses shared data. So, it’s not for free (by the way, have you ever heard the term “callbacks hell” related to JavaScript?), and should be used only, if there are some remarks for that (a lot of time spend on waiting for the response data, while CPU usage is low).
Gevent is a concurrency library that provides API for numerous of concurrency and network related jobs. It is based on greenlets (a coroutine module written as C extension). It may reduce the time needed to handle multiple calls to our endpoints.
Let’s create two python files example_without_gevent.py and example_with_gevent.py, then time them. Convince yourself.
# example_without_gevent.py
import requests
def run():
urls = [
'http://www.google.com',
'http://www.python.org',
'http://www.wikipedia.org',
'http://www.github.com',
]
responses = [requests.get(url) for url in urls]
$ python -mtimeit -n 3 'import example_without_gevent' 'example_without_gevent.run()'
3 loops, best of 3: 1.52 sec per loop
# example_with_gevent.py
import gevent
from gevent import monkey
monkey.patch_all()
import requests
def run():
urls = [
'http://www.google.com',
'http://www.python.org',
'http://www.wikipedia.org',
'http://www.github.com',
]
jobs = [gevent.spawn(lambda url: requests.get(url), each) for each in urls]
gevent.joinall(jobs, timeout=5)
responses = [requests.get(url) for url in urls]
$ python -mtimeit -n 3 'import example_with_gevent' 'example_with_gevent.run()'
3 loops, best of 3: 647 msec per loop
Not everything is that rosy. To have gevent lib working properly, all the code that is using it has to be compatible with its version. That’s the reason, that some packages being developed by the community, sometimes block each other (especially C-extensions). Anyway, in most cases you’re not going to face it by yourself.
The other way, but the prettiest and the most modern is asyncio. Since Python 3.4, when it was introduced, asyncio allows to write concurrent code by providing high-level (coroutines, synchronizing concurrent code, subprocesses control) and low-level API (event loops), new keywords (await/async). If your project allows you to use the newest releases of Python, it’s probably the best way of dealing with concurrency. Let me reference an official website, where more detailed information has been published: https://docs.python.org/3/library/asyncio.html
Analysis of security vulnerabilities using Bandit
The community of OpenStack designed and created a tool called Bandit to find security weaknesses (e.g.: SQL injection). As a result of vulnerabilities tests user gets a clean console output with pointed cases that failed during test run.
Let’s create an example code with security issues intentionally (lines with issues marked with comments).
# 1st issue related to subprocess
import subprocess
import yaml
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
def read_config(file_name):
with open(file_name) as config:
# 2nd issue – unsafe yaml load
data = yaml.load(config.read())
def run_command(cmd):
# 3rd issue – shell=True
return subprocess.check_call(cmd, shell=True)
db = create_engine('sqlite://localhost')
Session = sessionmaker(bind=db)
def get_product(id):
session = Session()
# 4th issue – SQL injection
query = "select * from products where id='%s'" % id
return session.execute(query)
Run a following command to execute bandit tests on a file:
$ bandit my_file.py
This is the result:
Test results:
>> Issue: [B404:blacklist] Consider possible security implications associated with subprocess module.
Severity: Low Confidence: High
Location: example.py:1
More Info: https://bandit.readthedocs.io/en/latest/blacklists/blacklist_imports.html#b404-import-subprocess 1 import subprocess
2 import yaml
3 from sqlalchemy import create_engine
--------------------------------------------------
>> Issue: [B506:yaml_load] Use of unsafe yaml load. Allows instantiation of arbitrary objects. Consider yaml.safe_load().
Severity: Medium Confidence: High
Location: example.py:9
More Info: https://bandit.readthedocs.io/en/latest/plugins/b506_yaml_load.html
8 with open(file_name) as config:
9 data = yaml.load(config.read())
10
--------------------------------------------------
>> Issue: [B602:subprocess_popen_with_shell_equals_true] subprocess call with shell=True identified, security issue.
Severity: High Confidence: High
Location: example.py:13
More Info: https://bandit.readthedocs.io/en/latest/plugins/b602_subprocess_popen_with_shell_equals_true.html12 def run_command(cmd):
13 return subprocess.check_call(cmd, shell=True)
14
--------------------------------------------------
>> Issue: [B608:hardcoded_sql_expressions] Possible SQL injection vector through string-based query construction.
Severity: Medium Confidence: Low
Location: example.py:22
More Info: https://bandit.readthedocs.io/en/latest/plugins/b608_hardcoded_sql_expressions.html
21 session = Session()
22 query = "select * from products where id='%s'" % id
23 return session.execute(query)
--------------------------------------------------
Code scanned:
Total lines of code: 15
Total lines skipped (#nosec): 0
Run metrics:
Total issues (by severity):
Undefined: 0.0
Low: 1.0
Medium: 2.0
High: 1.0
Total issues (by confidence):
Undefined: 0.0
Low: 1.0
Medium: 0.0
High: 3.0
Bandit considers dozens of tests. Considering Flask platform there is one test worth to mention: checking if debug is set to True, which is fine in development instances, but not in production ones. Since creating a Flask application is shorter than single System.out.println() in Java, I don’t hesitate to place a basic example below:
from flask import Flask
app = Flask(__name__)
@app.route('/')
def index():
return 'Nothing is here'
if __name__ == '__main__':
app.run(debug=True)
Running bandit test on that code produces following result:
Test results:
>> Issue: [B201:flask_debug_true] A Flask app appears to be run with debug=True, which exposes the Werkzeug debugger and allows the execution of arbitrary code.
Severity: High Confidence: Medium
Location: app.py:13
More Info: https://bandit.readthedocs.io/en/latest/plugins/b201_flask_debug_true.html
12 if __name__ == '__main__':
13 app.run(debug=True)
Running such a test on the production branch before deploying to production environment is definitely a good practice. Notice, that automated tools like this shouldn’t be treated like an oracle. It should be used in addition to the serious tests (all of the kinds). Moreover, since it’s third-party library, it’s about trust to the authors too, isn’t it?
Bandit is configurable through .ini file:
[bandit]
skips: B201
Exclude: tests
Manage API using flask-restless
Flask-restless basically provides a mapping between database and model, simplifying generation of API for the model without writing routes, since accessing database tables is pretty much the same for all of the entities. As a result of GET request on the specific model flask-restless returns JSON.
We’re going to use example SQLite database available under http://www.sqlitetutorial.net/sqlite-sample-database/ as a local resource. The file is called chinook.db. Here’s the tree result of my working directory (just in case of look up need):
flask_restless_with_sqlite_example/
config.cfg
database
chinook.db
requirements.txt
src
my_app.py
venv
It’s handy to place database connection string in config.cfg, like shown below:
SQLALCHEMY_DATABASE_URI = 'sqlite:///../database/chinook.db'
DEBUG = True
chinook.db contains more than enough number of tables, although for the sake of this blog we’re going to focus on Albums and Artists tables.
my_app.py looks like following:
from pathlib import Path
import sqlalchemy as db
from flask import Flask
from flask_restless import APIManager
from sqlalchemy.ext.declarative import declarative_base
app = Flask(__name__)
app.config.from_pyfile(Path(Path(__file__).parent, '..', 'config.cfg'))
engine = db.create_engine(app.config['SQLALCHEMY_DATABASE_URI'])
session = db.orm.sessionmaker(bind=engine)()
Base = declarative_base()
class Albums(Base):
__tablename__ = 'Albums'
album_id = db.Column('AlbumId', db.Integer, primary_key=True)
title = db.Column(db.String(160))
artist_id = db.Column(
'ArtistId', db.Integer, db.ForeignKey('Artists.ArtistId')
)
class Artists(Base):
__tablename__ = 'Artists'
artist_id = db.Column('ArtistId', db.Integer, primary_key=True)
name = db.Column(db.String(160))
albums = db.orm.relationship('Albums', backref='Artists')
manager = APIManager(app, session=session)
manager.create_api(Albums)
manager.create_api(Artists)
if __name__ == '__main__':
app.run()
Application by default is serving data under http://localhost:5000/. Our API is accessed by sending GET request to http://localhost:5000/api/Albums. As a result, we receive a JSON (show below), and that’s it. We serve data from our database.
{
"num_results": 347,
"objects": [
{
"Artists": {
"artist_id": 1,
"name": "AC/DC"
},
"album_id": 1,
"artist_id": 1,
"title": "For Those About To Rock We Salute You"
},
{
"Artists": {
"artist_id": 2,
"name": "Accept"
},
"album_id": 2,
"artist_id": 2,
"title": "Balls to the Wall"
},
(…most of this JSON has been cut...)
{
"Artists": {
"artist_id": 8,
"name": "Audioslave"
},
"album_id": 10,
"artist_id": 8,
"title": "Audioslave"
}
],
"page": 1,
"total_pages": 35
}
Conclusion
There is enormous hype concerning microservices nowadays. Let’s clash it with the popularity of Python programming language (according to Stackoverflow’s statistics, Python is leading phrase to be asked about in theirs service) and we receive a pretty duo that is capable of handling microservices very well (with libraries ready to be used). In my opinion, it’s a fantastic time to learn about the catchy microservices in such a relevant language, as the Python is. Take a look at the biggest players on the market, they’ve already spotted the advantages of Python + microservices!