Graphlab 安装与使用



王成军

wangchengjun@nju.edu.cn

计算传播网 http://computational-communication.com

Problem

只有低版本的anaconda才可以安装,强行安装还会破坏掉anaconda的jupyter notebook中的kernel,排除使用anaconda运行graphlab的方案。

Register for Academic Use of GraphLab Create

https://turi.com/download/academic.html

Renew Academic License for GraphLab Create

https://turi.com/download/renew.html

License Renewal Confirmation

Your academic license for GraphLab Create has been renewed. Please restart GraphLab Create while connected to the internet.

Email: wangchengjun@nju.edu.cn

Expiration Date: 03-13-2019

Python 2.7.x

GraphLab Create installation requires a Python 2.7.x environment and pip version >= 7 and Anaconda2 v4.0.0 (64-bit). IPython Notebook is recommended.

To install a different version of Python without overwriting the current version

https://conda.io/docs/user-guide/tasks/manage-python.html

Creating a new environment and install the second Python version into it. To create the new environment for Python 2.7, in your Terminal window or an Anaconda Prompt, run:

conda create -n py27 python=2.7 anaconda

Activate the new environment 切换到新环境

  • linux/Mac下需要使用: source activate py27
  • windows需要使用: activate py27

退出环境: source deactivate py27 也可以使用 activate root切回root环境

  1. Verify that the new environment is your current environment.
  2. To verify that the current environment uses the new Python version, in your Terminal window or an Anaconda Prompt, run: python --version

In [1]:
! python --version


Python 2.7.14 :: Anaconda, Inc.

Install your licensed copy of GraphLab Create

pip install --upgrade --no-cache-dir https://get.graphlab.com/GraphLab-Create/2.1/your registered email address here/your product key here/GraphLab-Create-License.tar.gz

Error

Could not find a version that satisfies the requirement graphlab-create>=2.1 (from GraphLab-Create-License==2.1) (from versions: ) No matching distribution found for graphlab-create>=2.1 (from GraphLab-Create-License==2.1)

使用方法

https://turi.com/learn/userguide/

GraphLab Create is a Python package that allows programmers to perform end-to-end large-scale data analysis and data product development.

  • Data ingestion and cleaning with SFrames. SFrame is an efficient disk-based tabular data structure that is not limited by RAM. This lets you scale your analysis and data processing to handle terabytes of data, even on your laptop.

  • Data exploration and visualization with GraphLab Canvas. GraphLab Canvas is a browser-based interactive GUI that allows you to explore tabular data, summary plots and statistics.

  • Network analysis with SGraph. SGraph is a disk-based graph data structure that stores vertices and edges in SFrames.

  • Predictive model development with machine learning toolkits. GraphLab Create includes several toolkits for quick prototyping with fast, scalable algorithms.

  • Production automation with data pipelines. Data pipelines allow you to assemble reusable code tasks into jobs and automatically run them on common execution environments (e.g. Amazon Web Services, Hadoop).


In [2]:
from graphlab import SGraph, Vertex, Edge
g = SGraph()
verts = [Vertex(0, attr={'breed': 'labrador'}),
         Vertex(1, attr={'breed': 'labrador'}),
         Vertex(2, attr={'breed': 'vizsla'})]
g = g.add_vertices(verts)
g = g.add_edges(Edge(1, 2))
print(g)


This non-commercial license of GraphLab Create for academic use is assigned to wangchengjun@nju.edu.cn and will expire on March 14, 2019.
[INFO] graphlab.cython.cy_server: GraphLab Create v2.1 started. Logging: /tmp/graphlab_server_1551671450.log
SGraph({'num_edges': 1, 'num_vertices': 3})

In [3]:
from graphlab import SGraph, Vertex, Edge
g = SGraph()
verts = [Vertex(0, attr={'breed': 'labrador'}),
         Vertex(1, attr={'breed': 'labrador'}),
         Vertex(2, attr={'breed': 'vizsla'})]
g = g.add_vertices(verts)
g = g.add_edges(Edge(1, 2))
print g


SGraph({'num_edges': 1, 'num_vertices': 3})

In [4]:
g.show()


Canvas is accessible via web browser at the URL: http://localhost:62302/index.html
Opening Canvas in default web browser.

In [5]:
from graphlab import SFrame,SGraph
edge_data = SFrame.read_csv('../data/bond_edges.csv')
    #'https://static.turi.com/datasets/bond/bond_edges.csv')

g = SGraph()
g = g.add_edges(edge_data, src_field='src', dst_field='dst')
print(g)


Finished parsing file /Users/datalab/github/bigdata/data/bond_edges.csv
Parsing completed. Parsed 20 lines in 0.022952 secs.
------------------------------------------------------
Inferred types from first 100 line(s) of file as 
column_type_hints=[str,str,str]
If parsing fails due to incorrect types, you can correct
the inferred type list above and pass it to read_csv in
the column_type_hints argument
------------------------------------------------------
Finished parsing file /Users/datalab/github/bigdata/data/bond_edges.csv
Parsing completed. Parsed 20 lines in 0.009524 secs.
SGraph({'num_edges': 20, 'num_vertices': 10})

In [3]:
vertex_data = SFrame.read_csv('https://static.turi.com/datasets/bond/bond_vertices.csv')

g = SGraph(vertices=vertex_data, edges=edge_data, vid_field='name',
           src_field='src', dst_field='dst')


Downloading https://static.turi.com/datasets/bond/bond_vertices.csv to /var/tmp/graphlab-chengjun/1639/be40440c-c4eb-45f1-ae71-501011acc59f.csv
Finished parsing file https://static.turi.com/datasets/bond/bond_vertices.csv
Parsing completed. Parsed 10 lines in 0.010129 secs.
------------------------------------------------------
Inferred types from first line of file as 
column_type_hints=[str,str,int,int]
If parsing fails due to incorrect types, you can correct
the inferred type list above and pass it to read_csv in
the column_type_hints argument
------------------------------------------------------
Finished parsing file https://static.turi.com/datasets/bond/bond_vertices.csv
Parsing completed. Parsed 10 lines in 0.0082 secs.

In [6]:
g.show(vlabel='id', highlight=['James Bond', 'Moneypenny'], \
       arrows=True)


Canvas is updated and available in a tab in the default browser.