Welcome to InFlu!

Build Status

InFlu is an application for social network analysis for finding key nodes in the social influence process as the main feature. It gives you possibility to choose nodes based on the typical SNA metrics and using two models of social influence: Linear Threshold and Independent Cascade. The are three methods that allow you to use influence models to find the key nodes: brute-force, greedy and improved greedy known as CELF++ proposed in this paper

First steps

Only Python 3.6 or higher are supported.

If you have satisfied dependencies the installation should be as simple as

pip install influ

Dependencies

All dependencies will be installed automatically. However one of most important dependencies is cool python library called igraph which core is written in C. Sometimes it may cause some problem, igraph documentation should help.

Another problematic dependency is pycairo, here you can find precise documentation.

Debian / Ubuntu and derivatives

Installing those dependencies should help with both igraph and pycairo:

 sudo apt install build-essential python-dev libxml2 libxml2-dev zlib1g-dev 
 sudo apt install libcairo2-dev pkg-config

Windows

If you are using Windows you have to download unofficial installer of igraph here and install it by executing:

pip install <python_igraph-[igraph-version]-[python-version]-[windows version]>.whl

Despite that’s unofficial it’s recommended by maintainers of igraph library.

The pycairo library on Windows need Microsoft Visual C++ 14.0 to be installed.

macOS

There where no attempts to install influ on macOS. Any information about it will be appreciated.

Konect Reader

In order to test your concepts quickly there is a KonectReader which simplifies downloading and extracting datasets and loading them into Graph object.

from influ import reader

kr = reader.KonectReader()
print(kr.list) # list available datasets
graph = kr.load('manufacturing_emails')  # load dataset

Currently there is only few datasets available but you can provide your own config file with other datasets specified. Currently only datasets from Konect are supported.

Your own config file

# Content of my_custom_config.yaml
example_dataset:  # name that will be used to access dataset
  name: Example Dataset 1
  url: http://konect.uni-koblenz.de/networks/dataset_examle  # url where dataset is described [optional]
  download: http://konect.uni-koblenz.de/downloads/tsv/dataset_examle.tar.bz2  # url where dataset can be downloaded directly
  file: out.dataset_example_example  # name of file with 
  directed: False  # does graph should be considered as directed?
  edge_attributes:  # list of names attributes
    - distance      # if this list will be empty or there will be more attributes
    - another_attr  # it will be named `attrX` where X is index counted from 0
  vertex_attributes:                # list of vertex attributes with files where they are stored
    - name: alias                   # name of attribute
      file: ent.vertex_alias_name   # file with attribute

Loading your custom config extends (does not override) those previously loaded.

from influ import reader

kr = reader.KonectReader('./my_custom_config.yaml')  # loading at creation time
kr.add_config('./my_custom_config.yaml')      # adding config after creation

Working example

from influ import reader, finder

kr = reader.KonectReader()
graph = kr.load('manufacturing_emails')

sfinder = finder.SeedFinder(graph)
sfinder.configure(number=5, unit='number')
result = sfinder.greedy(model=finder.Model.IndependentCascade, depth=1)
sfinder.plot_influence(result, model=finder.Model.IndependentCascade, depth=1)

Reader

from influ import reader

read_graph

Read graph from file if it is saved in “events” format. Otherwise igraph read method is used. Parameters:

  • filepath [required] - path to file with graph
  • file_format [[optional, default: None] - optional (but recommended) format name; eg: events for InFlu native file format ncol for NCOL format
  • kwargs [[optional] - additional keyword arguments specific to source file type: directed, sep etc.

konect_to_csv

Converts konect dataset to format readable by influ Parameters:

  • source_path [required] - path to the “out” file downloaded from konect
  • dest_path [required] - path to destination where to save a file
  • edge_attrs [required] - optional edge attributes that are known to exists in out file if this parameter is missing or there are more attributes then unnamed attributes will be named as attr{index}

KonectReader

Interface to download configured datasets from Konect

from influ import reader

kr = reader.KonectReader()
print(kr.list) # list available datasets
graph = kr.load(kr.list[0])

KonectReader . list

List of names of available datasets

KonectReader . add_config

Add datasets specified by user in configuration file. Detailed description of configuration file can be found in “First steps - Your own config file” section. Parameters:

  • path [required] - path to config file

KonectReader . load

Loads dataset from Konect, and return loaded graph Parameters:

  • dataset_name [required] - name of the dataset to download; it have to be one of names from specified in list attribute

Finder

from influ import finder

SeedFinder

This is basic object for finding key nodes in your network. Works for both directed and undirected graphs. Graphs needs to have special structure and because of that only graphs loaded with reader module are recommended.

Initialization parameters

  • graph [required] – graph that will be analysed
  • number [optional, default: 5] - value of number or percentage of seeds to choose
  • unit [optional, default: ‘number’] - either percent or number;
  • random_seed [optional, default: None] – value used as seed for random function to ensure repetitive results;

SeedFinder . configure

Parameters:

  • number [optional, default: None] – value of number or percentage of seeds to choose; have to be configured together with unit parameter
  • unit [optional, default: None] – either percent or number; have to be configured together wit number parameter
  • random_seed [optional, default: None] – value used as seed for random function to ensure repetitive results. It’s used at the beginning of every model evaluation. If random_seed is equal to None (default) then no random seed will be used

SeedFinder . by_indegree

Return list of n first vertices indices sorted by their indegree. Takes no parameters.

SeedFinder . by_outdegree

Return list of n first vertices indices sorted by their outdegree. Takes no parameters.

SeedFinder . by_degree

Return list of n first vertices indices sorted by their degree. Takes no parameters.

SeedFinder . by_betweenness

Return list of n first vertices indices sorted by their betweenness. Takes no parameters.

SeedFinder . by_clustering_coefficient

Return list of n first vertices indices sorted by their clustering coefficient (transitivity). IMPORTANT: in directed graph only mutual edges will be considered Takes no parameters.

SeedFinder . greedy

Search for vertices indices that are the best seeds using greedy approach.

Parameters:

  • model [optional, default: Model.LinearThreshold] - model of social influence. Currently only Linear Treshold (LT) and Independent Cascade (IC) are available
  • threshold [optional, default: None] - defines value of threshold in influence model. In Linear Threshold model it defines threshold of sum of influence that have to applied to node to activate it. In Independent Cascade model it’s probability that activated node activates another node.
  • depth [optional, default: None] - how many iterations will be in spreading simulations :return: list of ids of nodes considered as the best seeds

SeedFinder . brute_force

Search for vertices indices that are the best seeds using brute force approach.

Parameters:

  • model [optional, default: Model.LinearThreshold] - model of social influence. Currently only Linear Treshold (LT) and Independent Cascade (IC) are available
  • threshold [optional, default: None] - defines value of threshold in influence model. In Linear Threshold model it defines threshold of sum of influence that have to applied to node to activate it. In Independent Cascade model it’s probability that activated node activates another node.
  • depth [optional, default: None] - how many iterations will be in spreading simulations :return: list of ids of nodes considered as the best seeds

SeedFinder . CELFpp

Search for vertices indices that are the best seeds using CELF++ approach.

Parameters:

  • model [optional, default: Model.LinearThreshold] - model of social influence. Currently only Linear Treshold (LT) and Independent Cascade (IC) are available
  • threshold [optional, default: None] - defines value of threshold in influence model. In Linear Threshold model it defines threshold of sum of influence that have to applied to node to activate it. In Independent Cascade model it’s probability that activated node activates another node.
  • depth [optional, default: None] - how many iterations will be in spreading simulations :return: list of ids of nodes considered as the best seeds

SeedFinder . plot_influence

Run influence simulation for given set of seed and plot result graph.

Parameters:

  • seeds [required] - list of seed ids for influence spreading simulation
  • model [optional, default: Model.LinearThreshold] - model of social influence. Currently only Linear Treshold (LT) and Independent Cascade (IC) are available
  • threshold [optional, default: None] - defines value of threshold in influence model. In Linear Threshold model it defines threshold of sum of influence that have to applied to node to activate it. In Independent Cascade model it’s probability that activated node activates another node.
  • depth [optional, default: None] - how many iterations will be in spreading simulations :return: list of ids of nodes considered as the best seeds

Model

Social influence model enum

Model . LinearThreshold

Enum value. Represents Linear Threshold Model

Model . IndependentCascade

Enum value. Represents Independent Cascade Model