计算机技术
CentOS下搭建爬虫程序开发环境
1.安装Python
为了方便,我们采用Anaconda集成环境来安装
2.安装scrapy
pip install scrapy
- 1
安装成功后会提示:
Successfully installed PyDispatcher-2.0.5 Twisted-16.6.0 attrs-16.3.0 constantly-15.1.0 cssselect-1.0.0 incremental-16.10.1 parsel-1.1.0 pyasn1-modules-0.0.8 queuelib-1.4.2 scrapy-1.2.1 service-identity-16.0.0 w3lib-1.16.0 zope.interface-4.3.2
- 1
3.安装Python语言的PostgreSQL数据库连接psycopg2
pip install psycopg2
- 1
安装过程中遇到如下错误:
[jimmy@hadoop1 ~]$ pip install psycopg2
Collecting psycopg2
Using cached psycopg2-2.6.2.tar.gz
Complete output from command python setup.py egg_info:
running egg_info
creating pip-egg-info/psycopg2.egg-info
writing pip-egg-info/psycopg2.egg-info/PKG-INFO
writing top-level names to pip-egg-info/psycopg2.egg-info/top_level.txt
writing dependency_links to pip-egg-info/psycopg2.egg-info/dependency_links.txt
writing manifest file 'pip-egg-info/psycopg2.egg-info/SOURCES.txt'
warning: manifest_maker: standard file '-c' not found
Error: pg_config executable not found.
Please add the directory containing pg_config to the PATH
or specify the full executable path with the option:
python setup.py build_ext --pg-config /path/to/pg_config build ...
or with the pg_config option in 'setup.cfg'.
----------------------------------------
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
解决方案:
export PATH=$PATH:/usr/pgsql/bin
pip install psycopg2
- 1
- 2
4.安装结果:
[jimmy@hadoop1 ~]$ pip install psycopg2
Collecting psycopg2
Using cached psycopg2-2.6.2.tar.gz
Building wheels for collected packages: psycopg2
Running setup.py bdist_wheel for psycopg2 ... done
Stored in directory: /home/jimmy/.cache/pip/wheels/49/47/2a/5c3f874990ce267228c2dfe7a0589f3b0651aa590e329ad382
Successfully built psycopg2
Installing collected packages: psycopg2
Successfully installed psycopg2-2.6.2
You are using pip version 8.1.2, however version 9.0.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
5.安装后的位置:
python:/data/python/anaconda2/bin/python
scrapy:/data/python/anaconda2/bin/scrapy
- 1
- 2
6.错误解决:
因为要连接pg数据库,因此,程序执行时报错如下,包括在ipython等客户端里面执行import psycopg2时都会报如下的错误:
/data/python/anaconda2/lib/python2.7/site-packages/psycopg2/__init__.py in <module>()
48 # Import the DBAPI-2.0 stuff into top-level module.
49
---> 50 from psycopg2._psycopg import BINARY, NUMBER, STRING, DATETIME, ROWID
51
52 from psycopg2._psycopg import Binary, Date, Time, Timestamp
ImportError: libpq.so.5: cannot open shared object file: No such file or directory
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
解决方案:
将lib.so.5.9加入到一个系统共享路径下,例如/usr/lib64中:
cd /usr/lib64
ln -s /usr/pgsql/lib/libpq.so.5.9 ./libpq.so.5
- 1
- 2
注意这个地方不能写成:
ln -s /usr/pgsql/lib/libpq.so.5 ./libpq.so.5
- 1
这样会报一个太多层链接的错误,查看 /usr/pgsql/lib/libpq.so.5会发现,这个文件本身就是一个软链接,指向/usr/pgsql/lib/libpq.so.5.9,因此,我们直接将libpq.so.5指向/usr/pgsql/lib/libpq.so.5.9即可
参考文献:http://stackoverflow.com/questions/12781566/error-while-loading-shared-libraries-libpq-so-5-cannot-open-shared-object-file
https://blog.csdn.net/embracejava/article/details/53384847