Giang Mạnh's Notes: March 2014

Mar 27, 2014

Ngành công nghệ thông tin

Trong ngành Công nghệ thông tin có một số "chức danh" nổi bật - nghề nổi bật sau.

Lập trình viên: là người tiếp cận, hiện thực hóa các ý tưởng bằng kỹ năng lập trình và hiểu biết hệ thống.
Thường hầu hết sẽ là các "anh chàng cận", bạn sẽ ngồi cả ngày với cái máy tính, lõ mọ đọc tài liệu (10%) gõ "code" (20% thời gian), gỡ lỗi (50%), đi uống cà phê (% thời gian còn lại) :)

Kỹ sư: Ở mức độ cao hơn lập trình viên - kỹ sư là người thiết kế. Vạch ra phương hướng, làm các bản đặc tả yêu cầu và kiểm soát chất lượng, kết quả lập trình. Đôi khi người kỹ sư cũng làm một phần công việc của người lập trình ở các giai đoạn khó. Ở mức độ cao hơn, kỹ sư sẽ là trưởng nhóm phát triển, trưởng dự án hay các cấp độ quản lý ở tầm chiến lược hơn.

Kỹ sư cầu nối: tức là kỹ sư có khả năng giao tiếp bằng 2 hay nhiều ngôn ngữ.
Hiện nay mô hình kinh doanh của một số công ty lấy đối tác và nguồn dự án ở nước ngoài. Khi đó công ty sẽ cử người đi làm việc với khách hàng, lắng nghe họ, truyền đạt trở về các thành viên khác không trực tiếp.
Phải là kỹ sư để có thể hiểu được và hỗ trợ tức thì các vấn đề kỹ thuật.

An interesting Nginx config

Limit the number of connections per the defined key. Example: key = IP address:

http {
    limit_conn_zone $binary_remote_addr zone=addr:10m;
}

server {
    location /download/ {
    limit_conn addr 1;
}

http://nginx.org/en/docs/http/ngx_http_limit_conn_module.html

Bắt đầu Linux

Linux là một hệ điều hành có nhiều tính năng hay và phù hợp với học tập, nghiên cứu. Tuy nhiên nó xấu và khá khó dùng.

Để bắt đầu học, cần cài đặt Puppy Linux cùng sử dụng với Windows:
http://puppylinux.org/

Nếu muốn chạy trên máy tính cũ, trên USB, máy ảo (ví dụ VirtualBox), thì phiên bản Slitaz sau khá thích hợp (khoảng 40B). Nó cũng có thể chạy từ USB/CD và cài đặt được.
Tải về:
http://mirror.slitaz.org/iso/4.0/slitaz-4.0.iso

Để cài lên USB, tải chương trình sau và làm theo chỉ dẫn:
http://www.objectif-securite.ch/slitaz/tazusb.exe
Ngoài ra có thể sử dụng các phiên bản Ubuntu khá phổ biến.

Địa chỉ tải gốc:

http://mirror.slitaz.org/iso/4.0/slitaz-4.0.iso

Bắt đầu học Python

Python là ngôn ngữ đơn giản, dễ dùng, dễ đọc!
Python cũng là ngôn ngữ được dùng nhiều nhất ở Google.

Để bắt đầu, trên Windows, tải về:

http://www.python.org/ftp/python/3.3.4/python-3.3.4.msi
Để khám phá và tập lập trình, có thể dùng chương trình sau:
https://code.google.com/p/pyscripter/downloads/detail?name=PyScripter-v2.5.3-Setup.exe&can=2&q=

Với Linux và Mac: Python 2 đã có sẵn.

Hướng dẫn ngắn gọn về học Python
http://en.wikibooks.org/wiki/Non-Program...r_Python_3

Mar 24, 2014

Python DBAPI2

It took me a hour to get rows updated using Python on MySQL, specially PyMySQL driver. The problem is the query is not auto-committed. I have to add a line at the end:

cursor.execute('commit')

Moreover, quoting with HTML data is in trouble, having to do it manually. An important note is, all queries are pre-fetched by default. That mean all data was in RAM no matter we call fetchone or fetch all. In order to save memory, we have to use SSCursor:

cursor = conn.cursor(pymysql.cursors.SSCursor)

Here is a skeleton, querying note field and update it:

connection = pymysql.connect(host='127.0.0.1', user='root', passwd='', db='name', charset='utf8')
cursor = connection.cursor()
cursor.execute('SELECT id, note FROM products')
rows = cursor.fetchall()
for row in rows:
    # new_note = xyz(rows[0])...
    sql = "UPDATE products SET note='%s' WHERE id=%s" % (connection.escape_string(new_note), row[0])
    cursor.execute(sql)
cursor.execute('commit')
cursor.close()
connection.close();

Mar 12, 2014

Stdout UTF8 in Python3

import codecs
import sys
sys.stdout = codecs.getwriter("utf-8")(sys.stdout.detach())

Mar 6, 2014

In clause in SQL

If we execute the following with expected thousand rows in subquery result, it make takes a while (some minute):

DELETE FROM table 
WHERE row_id IN (
    SELECT ref_id IN ref_table 
    WHERE a_col = '..'
)

I discovered the reason for slowness: IN clause run ineffective with non indexed items So here is a solution:

CREATE TABLE tmp_ref AS
    SELECT ref_id IN ref_table 
    WHERE a_col = '..';

DELETE FROM table 
WHERE row_id IN (SELECT tmp_ref IN tmp_ref);

DROP TABLE tmp_ref;

Run Python as service

Updated: in 2016+, it's the best to use Systemd

Using Python event-driven such as Tornado, we can develop a great system as an alternative to NodeJS technology. However, keep it running as a system service is not built-in. We have to managed ourself to do this. Luckily there is Supervisord http://supervisord.org , written in Python, and greatly get our service in Python up. It can serve not only Python, but almost every program. The installation is trivial in Ubuntu with apt-get install supervisor. Other distro can use pip install supervisor. Below is the case of Ubuntu server installation. We need to create a conf file at /etc/supervisor/conf.d/, as example a-svc.conf (noted the file extension)

[program:a-svc]
process_name=a-svc
user=www-data
dicretory=/srv/svc/
command=/usr/bin/python3 -u -m tornado.autoreload /srv/svc/service.py
# Two setting below is interesting
autostart=true
autorestart=true
redirect_stderr=true
stdout_logfile=/var/log/supervisor/svc.log

Then bring all up running with sudo service supervisor restart Assume our service running at 9876 We can test using curl http://127.0.0.1:9876/a-query To public this service, we can add some kind of config in Nginx:

server {
    # For public access
    location /api/ {
     rewrite ^/api/(.*)$ /$1 break;
     proxy_pass http://127.0.0.1:9876;
 }
}

We have it done nicely!

Mar 4, 2014

Tunnel local web interface

Assuming we have a local web admin (e.g. Solr Admin on 8833), we want to view it via our browser. Because it's not exposed as public server, we will tunnel via SSH:

Using a server target port from localhost local port, then access via http://localhost:{local_port}

ssh -L {local_port}:localhost:{target_port} {user}@{host}

Mar 3, 2014

Auto resize and optimize images

ImageMagick helps us with command utility: mogrify. It replaces original files, so be careful with this.

The trick is how to search for newly upload files. Here is a solution:

# Create a file to compare (default is now)
# A manual options is: touch --date "2014-01-01"
touch /tmp/fcmp~
# Process newly added files
find -type f -newer /tmp/fcmp~ -name *.jpg -exec mogrify -resize 1200x1200 {} \;

Laravel resources

http://laravel-recipes.com/

http://cheats.jesse-obrien.ca/

http://laravel.com/docs

Mar 2, 2014

For backup: Auto commit Git script

I'm a novice in Bash scripting. But after trying to write this by myself, I found Bash is really useful and interesting as well.

#!/bin/bash
# ------------------------------------------------------
# Git: Auto-commit on scheduled
# ------------------------------------------------------
# Path to list of sites
# Sites should be under git versioning
# Example: /srv/site-a should have:
#
# /srv/site-a/public <-- HTTP server point here
# /srv/site-a/.git
SRV_DIR="/srv"
# ------------------------------------------------------
SRV_LIST=$(ls $SRV_DIR)
for s in $SRV_LIST; do
# Is a dir with git inside?
if [[ -d $s && -d $s/.git ]]
then
echo "Found: $s"
GIT_BASE=" git --git-dir=$SRV_DIR/$s/.git --work-tree=$SRV_DIR/$s "
IS_CHANGE=$($GIT_BASE status | grep 'Changes not staged' | wc -l)
if [ $IS_CHANGE -eq '1' ]
then
CURRENT_TIME=$(date)
$GIT_BASE add $SRV_DIR/$s
$GIT_BASE commit -m"Auto-commit $CURRENT_TIME"
$GIT_BASE push backup
fi
fi
done

A backup strategy for websites

This article is about an attempt to backup files and MySQL databases on a Linux host.

1. First, create a local git repos, called /git. Set permission: 700 unreadable, except the ower.

2. Each sites, use versioning:

cd /srv/web-a # Your site is located here
git init # This is the real, production file
git init --bare /git/web-a # This is a local backup
git add .
git remote add /git/web-a # Set path
git commit -m"Initial"
git push backup master

In order to secure .git folder, the domain mount point in HTTP server must be like /srv/web-a/public (in a subfolder).Otherwise, we need deny access to .git from webserver.

3. Whenever you make changes, do a commit and push it to backup repos (keep it updated!). We can even schedule to git on daily using cron (done later). This keep uploads and data are all the lastest in repos!

4. Database will be stored in /git/db (althought it's not a repos :D )
We need create an user for dumping, with minimal permission as needed:

CREATE USER 'dump'@'localhost' IDENTIFIED BY 'your-password';
GRANT LOCK TABLES, REFERENCES, SELECT, SHOW VIEW ON *.* TO 'dump'@'localhost';

5. Make a script at /home/you/backup.sh, chmod 700 (runable and readonly for you):

mysqldump --lock-tables=false -u backup -p"your_password" --all-databases \
| bzip2 > /git/db/all-$(date +%F).bz2
find /git/db/* -mtime +15 -exec rm {} \;
cd /srv/web-a
git add .
git commit -m"Auto-commit at $(date +%F)"
git push backup master

We can customize to dump each of your desire database.
The most important option is --lock-tables=false. It helps websites keep runing while dumping.
The second line is used to remove old backup aged more than 15 days.

6. Finally we need a cron to run:

0 23 * * * /home/you/backup.sh 2>&1 >/dev/null

This will run on everyday at 23:00. Enjoy you night without worry! : )

7. When we are in trouble, we can use powerful git to restore the version we want. Currently database dump is not optimized, because it's a full backup. If there is a solution for incremental backup, it's better to save space and able to archive longer timespan.