begin to learn TensorFlow

Veröffentlicht am 2016-10-05

Introduction: begin to learn Machine Learning, and find Google’s open source technology: TensorFlow.

1、It seems TF only support Linux by official , but I try to install also on Win 10, so try :： http://wiki.jikexueyuan.com/project/tensorflow-zh/：
though some guy reported successed on win, but I failed as can not download the big file no matter with USA IP, anyway , I record the method.

捣鼓好了windows docker安装，参考了楼上的许多信息。
1. 安装docker
2. 点开Docker Quickstart Terminal， 打开成功后：
    docker is configured to use the default machine with IP 192.168.99.100
3. 安装tensorflow： docker run -d -p 8888:8888 -v /notebook:/notebook xblaster/tensorflow-jupyter
4. 运行tensorflow-jupyter: docker run xblaster/tensorflow-jupyter
     会提示running at: http://0.0.0.0:8888，不知道为什么会是这个IP地址，用浏览器打开不了。然后替换成docker打开时的IP，http://192.168.99.100:8888就可以打开了。
5. 运行example code，mnist参考资料：1, tensorflow官方文档；2，http://blog.csdn.net/yhl_leo/article/details/50614444 及其mnist的github：https://github.com/yhlleo/mnist，github中有input_data.py这个很重要的文件。

2、 I run it on Ubuntu and test an easy exmaple which is :

y =0.1*x +0.3

and after 200 times of calculating , get result :

from the result : after 40 times calculating , the result closes to 0.1*x + 0.3 already , no matter how AlphaGo is so strong.

numpy和pandas的初步学习以及6本数据分析必读书和2份英文教程

Veröffentlicht am 2016-10-02

简介：今天把cmder的配色研究下，找了个nice的shell版，赞一个，另外，英文是绕不过去的，最后总会发现想要的资料只有英文的。：）

1、这几天抽空看了下《利用python进行数据分析》的几个章节，其中pandas部分看了2遍，熟悉了一些命令和用法，

2、国外一个朋友的对某个问题的建议是使用JS，首先我是按照这个网页的心得：不过我找了一本JS的入门书，发现内容不是兴趣所在，所以暂时先记录之。

http://kb.cnblogs.com/page/191787/

3、另搜资料的时候，找了一个国外的网址，介绍了数据分析几本不错的书，

Must have books for data scientists (or aspiring ones)

4、在上述链接中，是6本书的英文介绍，貌似R的数量更多，并且有一本是R的，但有第三方给出了Python代码，（注意有的文字因粘贴丢失了超链接）：

1. R Cookbook by Paul Teetor

This is simply the best book to start your journey with R. It contains tons of examples and practical advice on a wide range of topics like file input / output, data manipulations, merging and sorting to building a regression model. For a starter in R, this book becomes your best pal during the initial testing time.

While the book is aimed towards starters, it still remains a prominent feature of the library of any data scientist.



2. Machine Learning for Hackers by Drew Conway & John Myles White

I think this book actually has a wrong title. I dropped purchasing it twice before giving it a shot (which happened only because of a recommendation from a close friend). This book is meant for data scientists and not hackers. I don’t know why the title says so. A very practical manual for learning machine learning, it comes with good visuals and you can get a copy of codes in Python (original book is based on R).



3. R graphics cookbook by Winston Chang

You can’t be a good data scientist unless you master the graphics in R! There is no better way for visualization, but to learn ggplot2. Sadly, learning ggplot2 might seem like learning a completely new language in itself. This is where this “cookbook” comes to rescue. The recipes from Winston are short, sweet and to the point. Buy this and it is bound to end up as one of the most referred book in your library.



4. Programming Collective Intelligence by Toby Segaran (popularly referred as PCI)

If there is one book you want to choose, out of this selection (for learning machine learning) – it is this one. I haven’t met a data scientist yet who has read this book and does not recommend to keep it on your bookshelf. A lot of them have re-read this book multiple times. The book was written long before data science and machine learning acquired the cult status they have today – but the topics and chapters are entirely relevant even today! Some of the topics covered in the book are collaborative filtering techniques, search engine features, Bayesian filtering and Support vector machines. If you don’t have a copy of this book – order it as soon as you finish reading this article! The book uses Python to deliver machine learning in a fascinating manner.



5. Python for Data Analysis by Wes McKinney

Written by Wes McKinney, this book teaches you everything you need about Pandas. For the starters (not sure why you are still reading this article), pandas are Python’s way to handle data structures. Except for the title of the book (which I find misleading), I like everything else about this book. It contains ample codes and examples to leave you capable of performing any operation / transformation on a dataframe in Python (using pandas).

For the advanced users, if you already know pandas, you should look at this presentation from Wes on what are the shortcomings of pandas.



6. Agile data science by Russell Jurney

A recent addition by O’Reilly, this book looks like a must read for data scientists. The focus is on using “light” tools, which are easy to use and still get the work done. This is currently on my reading list and I’ll update more details once I have read it.



These are the 6 must have books, if you are serious about being a data scientist. There are a couple of additional Python books, which you can consider – Natural Language processing with Python by Steven Bird et al and Mining the social web by Matthew A. Russell. The reason I have not kept them in the list is because you can find a lot of the information in these books easily on the web.

5、另外，还有2篇不错的英文的基于python-pandas的数据分析教程：

A Complete Tutorial to Learn Data Science with Python from Scratch

以及这个：

Data Munging in Python (using Pandas) – Baby steps in Python

收到《Python for data analysis》作者的邮件回复

Veröffentlicht am 2016-09-27

简介：收到《Python for data analysis》作者的邮件回复：）

1、上次给他写了一个邮件，建议用Anaconda来写下一版本，虽然过了几天还是收到了回复，看样子他也意识到Anaconda更好点，也表示第二版会采用：

try

from matplotlib.pyplot import *

and running

%matplotlib

I am creating a 2nd edition of the book, and it will use Anaconda
instead of Canopy in the instructions.

Thanks!

2、然后我去Python中文社区问了几个人，表示有兴趣翻译，由于第一版是机械工业出版社搞的中译版，发了一个email咨询是否将来打算出第二版的中译，但是还没得到回复，想了下，如果他们不翻译，就召集社区的小伙伴们来翻译吧。

matplotlib中文显示乱码的解决办法

Veröffentlicht am 2016-09-27

简介： matplotlib中文显示乱码的解决办法：）

1、在源代码开头加入以下几行：

from pylab import *
mpl.rcParams['font.sans-serif'] = ['SimHei'] #指定默认字体

mpl.rcParams['axes.unicode_minus'] = False #解决保存图像是负号'-'显示为方块的问题

2、上述就针对单个py文件，如果想全部的，可以这么操作：

\Lib\site-packages\matplotlib\mpl-data\matplotlibrc 用任意文本编辑器打开。（最好先备份一下）
找到第129行：#font.family，将其注释去掉，冒号后面的值改为Microsoft YaHei
找到第141行：#font.sans-serif，将其注释去掉，并将Microsoft YaHei添加到冒号后面的最前面，注意还要再加一个英文逗号（,）
并设置axes.unicode_minus = False #解决保存图像是负号’-‘显示为方块的问题

如何把kindle的电子书转成word等格式的教程

Veröffentlicht am 2016-09-25

简介：如何把kindle的电子书转成word等格式的教程：）

1、参考知乎的步骤， http://www.zhihu.com/question/38451995

2、下载去除DRM的破解版本，不然只能破解3本书，参考此网址： http://www.d9soft.com/soft/102882.htm

3、使用该网站，进行pdf转换： http://www.epubconverter.com/azw-to-pdf-converter/

4、成功把《利用Python进行数据分析》转成了pdf，当然我事先花销了巨额的0.1元注册了一个Kindle Unlimited 帐号

5、为了方便复制书中的代码实例，继续用calibre 软件进行转换成docx格式，不用对着那些网上的扫描版，或者纸质书，坑爹的挨个自己输入命令了有木有，

6、示范图：

7、我已经为偷懒的你，上传了本书，可以直接下载word版本啦。请点击【下载点我】

8、记得重命名，方便以后查找。

9、第八章spx指数的举例，代码我修正到pycharm里：

import matplotlib.pyplot as plt
from datetime import datetime
import pandas as pd
fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
data = pd.read_csv('D:\mine\pydata-book-master\ch08/spx.csv', index_col=0, parse_dates=True)
spx = data['SPX']
spx.plot(ax=ax, style='k-')
crisis_data = [      (datetime(2007, 10, 11), 'Peak of bull market'),      (datetime(2008, 3, 12), 'Bear Stearns Fails'),      (datetime(2008, 9, 15), 'Lehman Bankruptcy') ]
for date, label in crisis_data:
    ax.annotate(label, xy=(date, spx.asof(date) + 50),                 xytext=(date, spx.asof(date) + 200),                 arrowprops=dict(facecolor='black'),                 horizontalalignment='left', verticalalignment='top')
# 放大到2007-2010
    ax.set_xlim(['1/1/2007', '1/1/2011'])
    ax.set_ylim([600, 1800])
    ax.set_title('Important dates in 2008-2009 financial crisis')

plt.show()

结果图，有木有一点专业的味道？

数值分析和一个bug解决记录【python】

Veröffentlicht am 2016-09-23

简介：数值分析和一个bug解决记录：）

1、这个bug是在使用python3.5.2的时候发生的，由于同时安装anaconda 有时候因为一些测试python2的代码也安装卸载最终出现了： Fatal error in launcher: Unable to create process using ‘“‘

2、解决方案，控制面板卸载后，再到python3 的安装路径把文件夹都删除，此时，保留anaconda版本，在cmder输入ipython就可以正常运行了。

3、说到数值分析，感觉还是要先阅读下相关书籍，虽然以前matlab接触过一些。首先看的就是评价还不错的《利用python进行数值分析》，但是发现了一个问题，所以马上给作者写了一个email如下图：

4、

《利用Python进行数据分析》豆瓣电影TOP250的爬取和作图分析【python】

Veröffentlicht am 2016-09-23

简介：又回到豆瓣了，这是多适合爬虫的网站：）

1、这次的对象是top 250 ，网址： https://movie.douban.com/top250 目的是对这250部电影的分类做一个统计。

2、实际有其他人做了分析，正好看到，发现其代码比较漂亮，所以研读，顺便复习检阅自己学习成果。秉承一贯的作风，做一定的细节分析。先看下原文代码：

# -*- coding: utf-8 -*-
# !/usr/bin/env python

from lxml import etree
import requests
import pymysql
import matplotlib.pyplot as plt
from pylab import *
import numpy as np

# 连接mysql数据库
conn = pymysql.connect(host = 'localhost', user = 'root', passwd = '54545454', db = 'douban', charset = 'utf8')
cur = conn.cursor()
cur.execute('use douban')

def get_page(i):
    url = 'https://movie.douban.com/top250?start={}&filter='.format(i)

    html = requests.get(url).content.decode('utf-8')

    selector = etree.HTML(html)
    # //*[@id="content"]/div/div[1]/ol/li[1]/div/div[2]/div[2]/p[1]
    content = selector.xpath('//div[@class="info"]/div[@class="bd"]/p/text()')
    print(content)

    for i in content[1::2]:
        print(str(i).strip().replace('\n\r', ''))
        # print(str(i).split('/'))
        i = str(i).split('/')
        i = i[len(i) - 1]
        # print('zhe' +ｉ)
        # print(i.strip())
        # print(i.strip().split(' '))
        key = i.strip().replace('\n', '').split(' ')
        print(key)
        for i in key:
            if i not in douban.keys():
                douban[i] = 1
            else:
                douban[i] += 1

def save_mysql():
    print(douban)
    for key in douban:
        print(key)
        print(douban[key])
        if key != '':
            try:
                sql = 'insert douban(类别, 数量) value(' + "\'" + key + "\'," + "\'" + str(douban[key]) + "\'" + ');'
                cur.execute(sql)
                conn.commit()
            except:
                print('插入失败')
                conn.rollback()


def pylot_show():
    sql = 'select * from douban;'
    cur.execute(sql)
    rows = cur.fetchall()
    count = []
    category = []

    for row in rows:
        count.append(int(row[2]))
        category.append(row[1])
    print(count)
    y_pos = np.arange(len(category))
    print(y_pos)
    print(category)
    colors = np.random.rand(len(count))
    plt.barh()
    plt.barh(y_pos, count, align='center', alpha=0.4)
    plt.yticks(y_pos, category)
    for count, y_pos in zip(count, y_pos):
        plt.text(count, y_pos, count,  horizontalalignment='center', verticalalignment='center', weight='bold')
    plt.ylim(+28.0, -1.0)
    plt.title(u'豆瓣电影250')
    plt.ylabel(u'电影分类')
    plt.subplots_adjust(bottom = 0.15)
    plt.xlabel(u'分类出现次数')
    plt.savefig('douban.png')


if __name__ == '__main__':
    douban = {}
    for i in range(0, 250, 25):
        get_page(i)
    save_mysql()
    pylot_show()
    cur.close()
    conn.close()

3、首先是他用了selector = etree.HTML(html) 而我用的比较多的是bs4，应该是殊途同归。他的xpath路径是：

content = selector.xpath('//div[@class="info"]/div[@class="bd"]/p/text()')

而我用的偷懒模式：直接浏览器找到的：

selector.xpath('//*[@id="content"]/div/div[1]/ol/li[1]/div/div[2]/div[2]/p[1]/text()')

注意，默认到P[1]的路径，因为是需要读取里面的文本部分，所以加入/text()

4、这样就得到了第一页的影视信息的文本，不过如果用ipthon分步操作查看，那简直是逆天的格式，如图：

5、必须进行文字提取处理了。

首先，这一大坨的，为了简单，我们用单元测试的思想，先调试第一页第一个，这里之前已经设置从：

url = 'https://movie.douban.com/top250?start={}&filter='.format(i)

修改为：

url = 'https://movie.douban.com/top250?start={1}&filter='

同时，观察整个循环输出的格式是一个列表，每个元素是一个字符串，那么第一个电影的电影信息对应的字符串是：

'\n                            导演: 弗兰克·德拉邦特 Frank Darabont\xa0\xa0\xa0主演: 蒂姆·罗宾斯 Tim Robbins /...', '\n                            1994\xa0/\xa0美国\xa0/\xa0犯罪 剧情\n                        '

和原始网页的信息对照，实际就是第一部分包含导演主演信息，第二部分是年份国家和剧种，就本文，我们需要的是第二部分的内容，也就是“1994 / 美国 / 犯罪剧情”这部分内容。然后呢，因为整个content 包含了25部电影的文本，这些都保存在列表里，并且我们需要进一步处理的，都在对应坐标1、3、5、、这样的奇数里。因为列表第一个是从0开始的。这就是第一个for循环的意义所在，用到了python的切片知识第二个冒号，表示从下表1的元素开始检索，并且隔开2个元素为下一次检索：

for i in content[1::2]:

查看了循环内的第一句为，注意这只是print，没有运行具体的代码影响原来的字符串：

print(str(i).strip().replace('\n\r', ''))

这句的意思就是得到整个content列表中，下标为1、3、5。。。。的元素，并通过strip（）来进行去掉首尾的多余的空格，,随后使用replace函数，把’\n\r’替换为空字符。我们可以在ipython中验证下：

记录：aa = conntent[1]：

aa =  '\n                            1994\xa0/\xa0美国\xa0/\xa0犯罪 剧情\n                        '

执行：aa.strip()，得到：

In [19]: aa.strip()
Out[19]: '1994\xa0/\xa0美国\xa0/\xa0犯罪 剧情'

In [20]: aa.strip().replace('\n\r', '')
Out[20]: '1994\xa0/\xa0美国\xa0/\xa0犯罪 剧情'

发现上述的replace函数似乎没起作用，那是这里正好没有’\n\r’，同时你可以发现，这里怎么有4个地方有\xa0 ？这是什么呢？原来是：转义字符，”\x”后接数字（两位）代表16进制数，这玩意牵涉编码的问题，打印的时候是一个空格。

然后循环里面的开始执行的命令都是为了去除\n和空格这些没有的东西，i = str(i).split(‘/‘)就是把去除空格和\n的字符串通过‘/’分隔在列表里。

In [35]: aa =  '\n                            1994\xa0/\xa0美国\xa0/\xa0犯罪 剧情\n                        '

In [36]: aa = aa.split('/')

In [37]: aa = aa(len(aa)-1)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-37-06f75198160a> in <module>()
----> 1 aa = aa(len(aa)-1)

TypeError: 'list' object is not callable

In [38]: aa = aa[len(aa)-1]

In [39]: aa
Out[39]: '\xa0犯罪 剧情\n

随后，通过

In [40]: key = aa.strip().replace('\n', '').split(' ')

In [41]: key
Out[41]: ['犯罪', '剧情']

发现，已经把\xa0当作左侧的空格给替换掉了，实际他显示出来就是一个空格的。然后通过中间的空格再劈开，得到了2个元素的字符串的列表。
这样就清洗出了所需要的数据。

下面的代码紧跟了一个针对key的for循环，是用来构建一个名为douban的字典dict，后续会把他的值再导入到数据库中。

for i in key:
    if i not in douban.keys():
        douban[i] = 1
    else:
        douban[i] += 1

里面的含义是：如果一个电影的分类关键词，在douban字典中（此时数据还没导入到数据库）不存在，那么新建一个对应名字的键值（keys），数值为1.否则如果是已经存在的，那么累加1.

5、第二个函数是def save_mysql():作用是把提取的数据加载到数据库中去。
其中insert的那一行命令写的有点个性，但因为用到了过多的引号，所以我不是很推荐，一般的写法是： %s + 变量名

6、最后一个函数是用来画图的，这块后续要配合nunpy，pandas等一直在加强下学习。

统计结果显示，绝大部分的好电影都是通过剧情抓住人心的：

7、参考：爬取豆瓣电影top250提取电影分类进行数据分析

验证码识别的测试【python】

Veröffentlicht am 2016-09-22

简介：验证码识别的测试，因故需做一个自动的多用户注册的脚本

1、查阅相关博客，建议使用的是Python+Selenium+PIL+Tesseract ，又有人推荐了Pytesser，不过这个Pytesser的安装还挺坑的。

2、开始参考的是：
http://www.th7.cn/Program/Python/201602/768304.shtml

以及这个

http://blog.csdn.net/lanfan_11/article/details/45558573

3、但测试了半天，无论我按照哪个方法，最终我都不能import pytesser ，一开始以为成功了，原来是我按照他的例子直接在pytesseract-v0.0.1的文件夹测试的脚本，那自然可以了。

4、毕竟第一次遇到需要这么折腾的第三方库，而且还是好几年前就停止更新了，干脆就直接把需要编写的py文件丢到这个pytesser文件夹来规避。

5、后来又发现有的博客推荐了pytesseract ，看名字应该是继承了pytesser，但是保持更新的，果然顺利安装，那么就是他了。那么更新下使用的工具组合为：Python+Selenium+PIL+pytesseract+Tesseract

6、剩下就是如何识别验证码的问题，由于测试的网站使用的.aspx的动态图，导致每次输入url得到不同的验证码。
所以一不做二不休，使用了截屏计算其验证码方位后单独识别的方案。(另外一个可操作方案是使用cookie)

过程：

# coding:utf-8
# python 3.5.2


from selenium.webdriver.support import ui as ui
from selenium.webdriver.common.keys import Keys #需要引入keys 包
from selenium.webdriver.common.action_chains import ActionChains
from selenium import webdriver
import pytesseract
import time
from PIL import Image,ImageEnhance

num = 400
# wait = ui.WebDriverWait(browser, 10)
# 已经人工测试了一个，所以从第二2个开始
for i in range(2, num+1):

    name_cn = 'aaa'        
    browser = webdriver.PhantomJS()
    browser.maximize_window()
    url = "http://www.wangzhi.com"
    browser.get(url)
    wait = ui.WebDriverWait(browser, 20)
    wait.until(lambda browser:browser.find_element_by_xpath("//*[@id=\"ussheng\"]"))
    shs1 = browser.find_element_by_xpath("//*[@id=\"ussheng\"]").send_keys("shengfen")
    shs2 = browser.find_element_by_xpath("//*[@id=\"uscity\"]").send_keys("xxshi")
    mhq = browser.find_element_by_xpath("//*[@id=\"usxian\"]").send_keys("yyqu")
    nling = browser.find_element_by_xpath("//*[@id=\"inscage\"]").send_keys("nianling")
    dwei = browser.find_element_by_xpath("//*[@id=\"worker\"]").send_keys("dizhi")
    name_input = browser.find_element_by_xpath("//*[@id=\"ustruename\"]").send_keys(name_cn + '%d'%i)

    # 选择性别，下拉框操作
    sex = browser.find_element_by_xpath("//*[@id=\"DropDownList1\"]")
    sex.find_element_by_xpath("//*[@id=\"DropDownList1\"]/option[2]").click()

    # 验证码识别，思路，右键保存验证码图片，识别数字或者字母。但是保存不能执行，失败
    ###############################################################################
    # # xpath： //*[@id="ValidImage"]
    # shi_bie_ma = browser.find_element_by_xpath("//*[@id=\"ValidImage\"]")
    # action = ActionChains(browser).move_to_element(shi_bie_ma)
    # action.context_click(shi_bie_ma)
    # action.send_keys(Keys.ARROW_DOWN)
    # action.send_keys('v')
    # action.perform()
    ###############################################################################



    # 通过下载图片后识别，发现是aspx 每次打开url都更新识别码，所以失败。看来只好一开始就截屏啦。

    # 截屏
    browser.get_screenshot_as_file('C:\\van\\image1.jpg')#比较好理解

    # 检测识别码坐标
    imgelement = browser.find_element_by_xpath('//*[@id="ValidImage"]') #定位验证码
    location = imgelement.location #获取验证码x,y轴坐标

    size=imgelement.size #获取验证码的长宽
    range=(int(location['x']),int(location['y']),int(location['x']+size['width']),int(location['y']+size['height'])) #写成我们需要截取的位置坐标

    im =Image.open('C:\\van\\image1.jpg')
    # 设置要裁剪的区域
    region = im.crop(range)     #此时，region是一个新的图像对象。
    #region.show()#显示的话就会被占用，所以要注释掉
    region.save("C:\\van\\image2.jpg")


     #--------------------图片增强+自动识别简单验证码-----------------------------
    #time.sleep(3)防止由于网速，可能图片还没保存好，就开始识别

    im=Image.open("C:\\van\\image2.jpg")
    imgry = im.convert('L')#图像加强，二值化
    sharpness =ImageEnhance.Contrast(imgry)#对比度增强
    sharp_img = sharpness.enhance(2.0)

    sharp_img.save("C:\\van\\image2.jpg")

    #http://www.cnblogs.com/txw1958/archive/2012/02/21/2361330.html

    #imgry.show()#这是分布测试时候用的，整个程序使用需要注释掉
    #imgry.save("E:\\image_code.jpg")
    im=Image.open("C:\\van\\image2.jpg")
    code = pytesseract.image_to_string(im)#code即为识别出的图片数字str类型
    print(code)
    #打印code观察是否识别正确

    #-------------------------------------------------------------------
    code_input = browser.find_element_by_xpath("//*[@id=\"txtCheckCode\"]").send_keys(code)
    # login
    browser.find_element_by_xpath("//*[@id=\"Button1\"]").click()
    print("fished,%d"%i)
    browser.quit()

7、程序分析：

虽然使用的截屏然后识别验证码的方式，也许不是最优雅，不过可可以通过元素的imgelement.location 来获取验证码x,y轴坐标，从而可以得到精确的验证码截图区域，否则人工去调试也不是不行，但需要在画图板用鼠标尽可能准确的定位。
增加了图片增加识别技术，加大了识别验证码的概率。
这段代码的不足之处是没有检测验证码识别提交后，是否失败的判断，这主要是测试的网站标的，实在做的太烂了，点了二维码不会立刻提示错误与否，而是卡机。

8、总结：有较多的注释，记录了一些操作思路。学习了验证码的初级识别技术。

IP池的建立

Veröffentlicht am 2016-09-21

简介：今天测试百度api，遇到bug验证半天，最后发现是ip被墙，。所以寻找ip池方法

1、由于已经有人写了相关的代码，因此只需要用来对照使用即可。

2、仍然记录一些细节，该程序的运行不是直接点main.py的run就完事了，而是需要在cmd下额外输入命令。

3、检测ok的ip会自动保存到mongodb，但中间因为软件兼容问题，可能遇到bson模块异常，记得升级mongoengine到最新。

4、不过抓了400个ip，只有10个返回ok，

参考 Scrapy爬取美女图片第三集代理ip(上) (原创)

bson模块丢失，原来是mongoengine坑-

Veröffentlicht am 2016-09-21

简介：运行别人的一个程序，提示缺少Bson模块，自然去安装，却发现没有名下的code模块

1、原来这个模块是和mongodb相关的，当安装了bson模块，因版本问题，会引起一些文件缺失，

2、而mongodb官网上，也不建议外装bson：

到pymongo官方文档里查，第一句就是：
Warning Do not install the “bson” package. PyMongo comes with its own bson package; doing “pip install bson” or “easy_install bson” installs a third-party package that is incompatible with PyMongo.
PyMongo has no required dependencies.

3、还好最后安装最新的mongoengine，解决了这个问题。

Introduction: begin to learn Machine Learning, and find Google’s open source technology: TensorFlow.

简介： 今天把cmder的配色研究下，找了个nice的shell版，赞一个，另外，英文是绕不过去的，最后总会发现想要的资料只有英文的。 ：）

简介： 收到《Python for data analysis》作者的邮件回复 ：）

简介： matplotlib中文显示乱码的解决办法 ：）

简介： 如何把kindle的电子书转成word等格式的教程 ：）

简介： 数值分析和一个bug解决记录 ：）

简介： 又回到豆瓣了，这是多适合爬虫的网站 ：）

简介： 验证码识别的测试，因故需做一个自动的多用户注册的脚本

简介： 今天测试百度api，遇到bug验证半天，最后发现是ip被墙，。所以寻找ip池方法

简介： 运行别人的一个程序，提示缺少Bson模块，自然去安装，却发现没有名下的code模块