Another Japan in the World

Jun Aruga's blog.

Python, Bio Tools and me

My idea

Needless to say, recently bio industry’s development is remarkable. After finishing first human genome’s full sequence, the investment by governments and big company, related services.

I am a fascinated person for that. I have lots of things what I want to try for that. Several experimental diagnosis services, monitoring goods for human body.

Not only that, I want to analyze it by myself. Because I live in IT programming world. In case of genome analysis, after sequencing from raw genome data, we can deal with it as array data in the programming world.

Especially I have an interest for longevity bio science. And I prefer bio computer science (bioinformatics) than doing experiment with test tubes. I am not scientist and researcher. So, I need different way from them to approach it.

Python and Bio

In the bioinformatics, which programming language is popular? It seems that 15 years ago “Perl” was popular.

Now “Python” and “R” are popular.

Python” and “C” is basically popular for wide used bio tools that are used in bio platform service such as “Google Genomics”, “Microsoft Genomics”, “Seven Bridge Genomics”, “DDBJ Pipeline”, API services provided by big institutions, or for people with programming background. “Java” and “Scala” may also be popular in the system by big data application such as “Hadoop”, “Spark”.

“R (R language)” is popular for the use adhoc analysis by casual programming people, such as scientists and researchers. They want to analysis for new way that has not been common yet.

Python has been popular for science industry than other programming languages. The open source community is more popular than others.

Each programming languages has bio package to deal with sequence data easily, effectively. Perl: BioPerl, Python: Biopython, Ruby: BioRuby, R: BioConductor (bio package management system). We can see the list on The Open Bioinformatics Foundation web site [5].

Biopython is maintained very well.

How to start Python?

Use “Code Style” tool

The official document [6] is enrich. If you watch latest Python Conference (PyCon) videos [7], you can feel the atmosphere, and which is popular topic.

How to learn the best practice? It is good to use a tool to tidy “Code Style” in your development, and refer the source. In Python it is “flake8” [8].

Read source code

Reading “django” [9] is also good. You may know the best practise reading it.

For me, Reading “virtualenv” [10] (or “Lib/venv” directory in Python 3), “tox” [11] is also fun.

Other documents

I want to introduce a document “The Hitchhiker’s Guide to Python!” [12]. This is great.