计算机技术

CentOS下搭建爬虫程序开发环境

1.安装Python

为了方便,我们采用Anaconda集成环境来安装

2.安装scrapy

pip install scrapy
  • 1

安装成功后会提示:

Successfully installed PyDispatcher-2.0.5 Twisted-16.6.0 attrs-16.3.0 constantly-15.1.0 cssselect-1.0.0 incremental-16.10.1 parsel-1.1.0 pyasn1-modules-0.0.8 queuelib-1.4.2 scrapy-1.2.1 service-identity-16.0.0 w3lib-1.16.0 zope.interface-4.3.2
  • 1

3.安装Python语言的PostgreSQL数据库连接psycopg2

pip install psycopg2
  • 1

安装过程中遇到如下错误:

[jimmy@hadoop1 ~]$ pip install psycopg2
Collecting psycopg2
  Using cached psycopg2-2.6.2.tar.gz
    Complete output from command python setup.py egg_info:
    running egg_info
    creating pip-egg-info/psycopg2.egg-info
    writing pip-egg-info/psycopg2.egg-info/PKG-INFO
    writing top-level names to pip-egg-info/psycopg2.egg-info/top_level.txt
    writing dependency_links to pip-egg-info/psycopg2.egg-info/dependency_links.txt
    writing manifest file 'pip-egg-info/psycopg2.egg-info/SOURCES.txt'
    warning: manifest_maker: standard file '-c' not found

    Error: pg_config executable not found.

    Please add the directory containing pg_config to the PATH
    or specify the full executable path with the option:

        python setup.py build_ext --pg-config /path/to/pg_config build ...

    or with the pg_config option in 'setup.cfg'.

    ----------------------------------------
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11
  • 12
  • 13
  • 14
  • 15
  • 16
  • 17
  • 18
  • 19
  • 20
  • 21
  • 22

解决方案:

export PATH=$PATH:/usr/pgsql/bin
pip install psycopg2
  • 1
  • 2

4.安装结果:

[jimmy@hadoop1 ~]$ pip install psycopg2
Collecting psycopg2
  Using cached psycopg2-2.6.2.tar.gz
Building wheels for collected packages: psycopg2
  Running setup.py bdist_wheel for psycopg2 ... done
  Stored in directory: /home/jimmy/.cache/pip/wheels/49/47/2a/5c3f874990ce267228c2dfe7a0589f3b0651aa590e329ad382
Successfully built psycopg2
Installing collected packages: psycopg2
Successfully installed psycopg2-2.6.2
You are using pip version 8.1.2, however version 9.0.1 is available.
You should consider upgrading via the 'pip install --upgrade pip' command.
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8
  • 9
  • 10
  • 11

5.安装后的位置:

python:/data/python/anaconda2/bin/python
scrapy:/data/python/anaconda2/bin/scrapy
  • 1
  • 2

6.错误解决:

因为要连接pg数据库,因此,程序执行时报错如下,包括在ipython等客户端里面执行import psycopg2时都会报如下的错误:

/data/python/anaconda2/lib/python2.7/site-packages/psycopg2/__init__.py in <module>()
     48 # Import the DBAPI-2.0 stuff into top-level module.
     49 
---> 50 from psycopg2._psycopg import BINARY, NUMBER, STRING, DATETIME, ROWID
     51 
     52 from psycopg2._psycopg import Binary, Date, Time, Timestamp

ImportError: libpq.so.5: cannot open shared object file: No such file or directory
  • 1
  • 2
  • 3
  • 4
  • 5
  • 6
  • 7
  • 8

解决方案:
将lib.so.5.9加入到一个系统共享路径下,例如/usr/lib64中:

cd /usr/lib64
ln -s /usr/pgsql/lib/libpq.so.5.9 ./libpq.so.5
  • 1
  • 2

注意这个地方不能写成:

ln -s /usr/pgsql/lib/libpq.so.5 ./libpq.so.5
  • 1

这样会报一个太多层链接的错误,查看 /usr/pgsql/lib/libpq.so.5会发现,这个文件本身就是一个软链接,指向/usr/pgsql/lib/libpq.so.5.9,因此,我们直接将libpq.so.5指向/usr/pgsql/lib/libpq.so.5.9即可
参考文献:http://stackoverflow.com/questions/12781566/error-while-loading-shared-libraries-libpq-so-5-cannot-open-shared-object-file

https://blog.csdn.net/embracejava/article/details/53384847

Related Articles

发表回复

Check Also
Close
Back to top button