factorset

pypi version status travis status Documentation Status

Factorset is a financial factors construction factory of Chinese A-share. 提供中国A股市场因子集合,包含各类常用及特异因子计算方法,持续更新中。 提供轻量级因子计算框架,高可扩展。持续更新中。

Features

  • Ease of Use: factorset tries to get out of your way so that you can focus on algorithm development. See below for a code example.

Installation

Installing With pip

Assuming you have all required (see note below) non-Python dependencies, you can install factorset with pip via:

$ pip install factorset

Note: Installing factorset via pip is slightly more involved than the average Python package. Simply running pip install factorset will likely fail if you’ve never installed any scientific Python packages before.

The FundCrawler class in factorset depends on proxy_pool, a powerful proxy crawler.

Quickstart

The following code generate a EP_TTM factor class.

import os
import pandas as pd
import tushare as ts
from factorset.factors import BaseFactor
from factorset.data.OtherData import code_to_symbol, shift_date, market_value
from factorset.data import CSVParser as cp
from factorset.Util.finance import ttmContinues

class EP_TTM(BaseFactor):
"""
:名称: 过去滚动4个季度(12月)市盈率的倒数
:计算方法: EP_TTM = 净利润(不含少数股东权益)_TTM /总市值
:应用: 市盈率越低,代表投资者能够以相对较低价格购入股票。

"""

def __init__(self, factor_name='EP_TTM', tickers='000016.SH', data_source='', factor_parameters={}, save_dir=None):
    # Initialize super class.
    super(EP_TTM, self).__init__(factor_name=factor_name, tickers=tickers,
                                 factor_parameters=factor_parameters,
                                 data_source=data_source, save_dir=save_dir)

def prepare_data(self, begin_date, end_date):
    """
    数据预处理
    """

    shifted_begin_date = shift_date(begin_date, 500) # 向前取500个交易日

    # 取利润表中“归属于母公司股东的净利润”项目,项目名称及数字详见FundDict
    inst = cp.concat_fund(self.data_source, self.tickers, 'IS').loc[shifted_begin_date:end_date,['ticker', 40]]
    inst['motherNetProfit'] = inst[40]
    inst.drop(40, axis=1, inplace=True)

    # ttm算法需要“财报发布日”与“财报报告日”两个日期作为参数
    inst['release_date'] = inst.index
    inst['report_date'] = inst.index

    # 净利润ttm
    profitTTM_ls = []
    for ticker in inst['ticker'].unique():
        try:  # 财务数据不足4条会有异常
            reven_df = ttmContinues(inst[inst['ticker'] == ticker], 'motherNetProfit')
            reven_df['ticker'] = ticker
        except:
            continue
        profitTTM_ls.append(reven_df)
    self.profitTTM = pd.concat(profitTTM_ls)

    # 取“OtherData”中总市值数据
    # Tushare的市值数据只有17年6月->now
    df = market_value(self.data_source + '\\other\\otherdata.csv', self.tickers)
    self.mkt_value = df.drop(['price', 'totals'], axis=1)

def generate_factor(self, trading_day):
    # generate_factor会遍历交易日区间, 即生成每个交易日所有股票的因子值
    earings_df = self.profitTTM[self.profitTTM['datetime'] <= trading_day]
    earings_df = earings_df.sort_values(by=['datetime', 'report_date'], ascending=[False, False])

    # 取最近1期ttm数据
    earings_df = earings_df.groupby('ticker').apply(lambda x: x.head(1))

    # 取当前交易日市值数据
    today_mkt_value = self.mkt_value.loc[trading_day]

    ret_df = earings_df.merge(today_mkt_value, on='ticker', how='inner')
    ret_df['EP_TTM'] = ret_df['motherNetProfit_TTM'] / ret_df['mkt_value']
    return ret_df.set_index('ticker')['EP_TTM']

if __name__ == '__main__':
    from_dt = '2017-07-15'
    to_dt = '2018-04-09'

    # 取沪深300
    hs300 = ts.get_hs300s()
    hs300.code = hs300.code.apply(code_to_symbol)

    EP_TTM = EP_TTM(
        factor_name='EP_TTM',
        factor_parameters={},
        tickers=hs300.code.tolist(),
        save_dir='',
        data_source=os.path.abspath('.'),
    )

    EP_TTM.generate_factor_and_store(from_dt, to_dt)
    print('因子构建完成,并已成功入库!')

You can find other factors in the factorset/factors directory.

Questions?

If you find a bug, feel free to open an issue and fill out the issue template.

Contributing

All contributions, bug reports, bug fixes, documentation improvements, enhancements, and ideas are welcome.

Installation

Stable release

To install factorset, run this command in your terminal:

$ pip install factorset

This is the preferred method to install factorset, as it will always install the most recent stable release.

If you don’t have pip installed, this Python installation guide can guide you through the process.

From sources

The sources for factorset can be downloaded from the Github repo.

You can either clone the public repository:

$ git clone git://github.com/quantasset/factorset

Or download the tarball:

$ curl  -OL https://github.com/quantasset/factorset/tarball/master

Once you have a copy of the source, you can install it with:

$ python setup.py install

Data Preparation

The Data Fetchers is a crawler module of pricing data, fundamental data and other market data like market value. Fetchers allow us to crawl all of the data we will need from Tushare or other sources.

Proxy Pool

To get fundamental Data, we need set up a proxy pool for our Fundcrawler.

Some recommended proxy pool project:

dungproxy https://github.com/virjar/dungproxy
proxyspider https://github.com/zhangchenchen/proxyspider
ProxyPool https://github.com/henson/ProxyPool
ProxyPool https://github.com/WiseDoge/ProxyPool
IPProxyTool https://github.com/awolfly9/IPProxyTool
IPProxyPool https://github.com/qiyeboy/IPProxyPool
proxy_list https://github.com/gavin66/proxy_list
proxy_pool https://github.com/lujqme/proxy_pool
haipproxy https://github.com/SpiderClub/haipproxy
proxy_pool https://github.com/jhao104/proxy_pool

Note

factorset uses proxy_pool, see in Proxy_start. You may also install redis for your proxy pool.

Configuration file

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
[TARGET]
;all = all
;all = hs300
all = 000001.SZ, 000002.SZ

[OPTIONS]
MONGO = False
CSV = True
;succ and fail list of symbol
SFL = True

[STORE]
hqdir = ./hq
;Only fund crawler will use proxy_pool
funddir = ./fund
otherdir =  ./other

[ARCTIC]
;host = 127.0.0.1:27017
host = localhost

[READ]
proxypool = 127.0.0.1:5010
proxymin = 5
encode = gbk

Run Fetcher

1
2
from factorset.Run import data_fetch
data_fetch.data_fetch()
Start Fetching Stock & Index Data!
000001.SZ写入完成
000002.SZ写入完成
Finish Fetching Stock & Index Data!

Start Fetching Other Data!
2017-06-15写入完成
2017-06-16写入完成
2017-06-19写入完成
2017-06-20写入完成
2017-06-21写入完成
2017-06-22写入完成
....
[Getting data:]##################
Finish Fetching Other Data!

Start Fetching Fundamental Data!
proxy: http://xxxx
proxy: http://xxxx
Put 000001.SZ in queue!
proxy: http://xxxx
Put 000001.SZ in queue!
proxy: http://xxxx
000002.SZ写入84条数据,
000001.SZ写入88条数据,
BS表数据导入成功!

proxy: http://xxxx
proxy: http://xxxx
Cannot connect to host xxxx ssl:False [Connect call failed ('xxxx', xxxx)]
Put 000001.SZ in queue!
proxy: http://xxxx
Put 000001.SZ in queue!
proxy: http://xxxx
000002.SZ写入85条数据,
000001.SZ写入91条数据,
IS表数据导入成功!

proxy: http://xxxx
proxy: http://xxxx
Put 000001.SZ in queue!
Put 000002.SZ in queue!
proxy: http://xxxx
proxy: http://xxxx
000001.SZ写入72条数据,
000002.SZ写入71条数据,
CF表数据导入成功!

Finish Fetching Fundamental Data!

Note

If the number of proxies in your proxy pool is less than proxy min you set in the CONFIG factorset uses proxy_pool, see in Proxy_start. You may also install redis for your proxy pool.

1
2
3
import os
from factorset.data import CSVParser as cp
cp.concat_fund(os.path.abspath('.'), cp.all_fund_symbol(os.path.abspath('.'), 'BS'), 'BS').loc[:,['ticker', 121, 101, 68, 109, 0]]
dateticker121101681090
2018/3/31000002.SZ1.22E+128.81E+113.42E+097.80E+094.11E+10
2017/12/31000002.SZ1.17E+128.47E+113.33E+091.61E+104.62E+10
2017/9/30000002.SZ1.02E+127.43E+112.79E+091.73E+103.69E+10
2017/6/30000002.SZ9.29E+116.76E+112.22E+091.34E+103.67E+10
2017/3/31000002.SZ8.87E+116.28E+112.63E+091.59E+103.25E+10
2016/12/31000002.SZ8.31E+115.80E+113.60E+091.66E+102.68E+10
2016/9/30000002.SZ7.56E+115.53E+115.35E+096.84E+092.41E+10
2016/6/30000002.SZ7.12E+115.10E+118.34E+092.85E+092.64E+10
2016/3/31000002.SZ6.59E+114.57E+111.15E+103.30E+092.37E+10
2015/12/31000002.SZ6.11E+114.20E+111.67E+101.90E+092.47E+10
2015/9/30000002.SZ5.71E+114.02E+111.83E+107.63E+082.34E+10
2015/6/30000002.SZ5.37E+113.79E+112.16E+104.77E+082.33E+10
2015/3/31000002.SZ5.26E+113.69E+112.47E+101.02E+092.26E+10
2014/12/31000002.SZ5.08E+113.46E+112.13E+102.38E+092.04E+10
2014/9/30000002.SZ5.20E+113.56E+112.13E+106.03E+091.38E+10
2014/6/30000002.SZ5.02E+113.36E+111.66E+106.58E+091.57E+10
2014/3/31000002.SZ4.95E+113.41E+111.40E+104.38E+092.35E+10
2013/12/31000002.SZ4.79E+113.29E+111.48E+105.10E+092.75E+10
2013/9/30000002.SZ4.61E+113.30E+111.32E+101.25E+102.64E+10
2013/6/30000002.SZ4.32E+113.14E+111.09E+101.20E+103.29E+10
2013/3/31000002.SZ4.18E+112.99E+117.87E+091.31E+103.10E+10
2012/12/31000002.SZ3.79E+112.60E+114.98E+099.93E+092.56E+10
2012/9/30000002.SZ3.48E+112.34E+111.69E+094.48E+091.41E+10
2012/6/30000002.SZ3.30E+112.17E+112.72E+084.80E+091.55E+10
2012/3/31000002.SZ3.10E+112.03E+110.00E+002.04E+091.58E+10
2011/12/31000002.SZ2.96E+112.01E+113.13E+071.72E+092.18E+10
2011/9/30000002.SZ2.83E+111.95E+113.13E+079.21E+082.28E+10
2011/6/30000002.SZ2.61E+111.73E+110.00E+001.61E+092.14E+10
2011/3/31000002.SZ2.35E+111.52E+110.00E+001.72E+091.86E+10
2010/12/31000002.SZ2.16E+111.30E+110.00E+001.48E+091.53E+10
..................
2004/9/30000002.SZ1.58E+107.06E+090.00E+002.81E+096.00E+07
2004/6/30000002.SZ1.32E+106.43E+094.00E+063.24E+091.60E+08
2004/3/31000002.SZ1.18E+105.76E+093.15E+090.00E+001.60E+08
2003/12/31000002.SZ1.06E+104.80E+093.00E+061.68E+091.60E+08
2003/9/30000002.SZ1.02E+104.88E+099.72E+061.56E+090.00E+00
2003/6/30000002.SZ9.47E+094.17E+099.69E+061.04E+090.00E+00
2003/3/31000002.SZ8.32E+093.14E+099.69E+065.20E+080.00E+00
2002/12/31000002.SZ8.22E+093.08E+099.69E+064.60E+080.00E+00
2002/9/30000002.SZ8.21E+093.18E+095.75E+087.60E+080.00E+00
2002/6/30000002.SZ8.22E+093.27E+090.00E+001.12E+090.00E+00
2002/3/31000002.SZ6.60E+093.23E+090.00E+001.50E+090.00E+00
2001/12/31000002.SZ6.48E+093.05E+090.00E+001.35E+090.00E+00
2001/6/30000002.SZ6.07E+092.73E+090.00E+009.86E+080.00E+00
2000/12/31000002.SZ5.58E+092.53E+090.00E+005.66E+080.00E+00
2000/6/30000002.SZ5.05E+092.06E+090.00E+006.96E+082.00E+07
1999/12/31000002.SZ4.49E+092.29E+090.00E+008.95E+080.00E+00
1999/6/30000002.SZ4.34E+092.11E+090.00E+008.15E+080.00E+00
1998/12/31000002.SZ4.04E+091.92E+090.00E+006.19E+080.00E+00
1998/6/30000002.SZ3.95E+091.93E+090.00E+005.74E+080.00E+00
1997/12/31000002.SZ3.96E+091.97E+090.00E+005.48E+080.00E+00
1997/6/30000002.SZ3.62E+092.04E+090.00E+000.00E+000.00E+00
1996/12/31000002.SZ3.47E+091.94E+090.00E+008.28E+084.05E+07
1996/6/30000002.SZ3.22E+091.66E+090.00E+000.00E+000.00E+00
1995/12/31000002.SZ3.23E+091.75E+090.00E+006.47E+083.01E+07
1995/6/30000002.SZ2.83E+091.57E+090.00E+000.00E+000.00E+00
1994/12/31000002.SZ2.68E+091.44E+090.00E+004.08E+082.49E+05
1994/6/30000002.SZ2.24E+091.19E+090.00E+000.00E+000.00E+00
1993/12/31000002.SZ2.14E+091.14E+090.00E+003.37E+080.00E+00
1992/12/31000002.SZ9.63E+087.05E+080.00E+001.97E+080.00E+00
1970/1/1000002.SZ0.00E+000.00E+000.00E+000.00E+000.00E+00

Fund Dict

Balance sheet

Income Statement

Cash Flow Statement

Factorset Usage

Module contents

In all listed functions, the self argument is implicitly the currently-executing BaseFactor instance.

class factorset.factors.BaseFactor(factor_name, tickers, factor_parameters, data_source, save_dir=None, mongolib=None)[source]

因子基类,用于因子的计算和存取 当创建新的因子时,需要继承此类,并实现prepare_data,和generate_factor方法。

clear_factor()[source]

对当天的因子进行清洗,主要有:

  1. 过滤掉无穷大和无穷小的值
  2. 过滤掉nan值

Todo

Survivor bias check

  1. 过滤掉未上市的股票(未上市可能已经有财报发布,导致会出现一些值)
  2. 过滤掉已经退市的股票
Returns:过滤后的因子值
generate_factor(trading_day)[source]

Note

必须实现generate_factor方法,用于计算因子并返回因子的值。

Parameters:trading_day – 循环交易日
generate_factor_and_store(from_date, to_date)[source]

计算因子并录入数据库

Parameters:
  • from_date – (str)起始时间
  • to_date – (str)结束时间
Returns:

None

get_factor_name()[source]

获取因子唯一名称

Returns:(str)因子名
get_trading_days()[source]

获取计算因子的交易日历

Returns:(list)交易日历
prepare_data(from_date, to_date)[source]

Note

必须实现prepare_data方法,用于一次性获取需要的数据。

Parameters:
  • from_date – 原始数据起始日
  • to_date – 原始数据结束日
save()[source]

存入数据

Generate Factors

To use factorset in a project, we need to inherit the BaseFactor for the New Factor:

import os
import pandas as pd
import tushare as ts
from factorset.factors import BaseFactor
from factorset.data.OtherData import code_to_symbol, shift_date, market_value
from factorset.data import CSVParser as cp
from factorset.Util.finance import ttmContinues

class NewFactor(BaseFactor):
"""
:Name: NewFactor
:Cal_Alg: NewFactor = blablabla
:App: NewFactor can never blow up your account!

"""

def __init__(self, factor_name='NewFactor', tickers='000016.SH', data_source='', factor_parameters={}, save_dir=None):
    # Initialize super class.
    super(NewFactor, self).__init__(factor_name=factor_name, tickers=tickers,
                                 factor_parameters=factor_parameters,
                                 data_source=data_source, save_dir=save_dir)

def prepare_data(self, begin_date, end_date):

    self.data = cp.choose_a_dataset()

def generate_factor(self, trading_day):

    factor_at_trading_day = awesome_func(self.data)

    return factor_at_trading_day

if __name__ == '__main__':
    from_dt = '2017-07-15'
    to_dt = '2018-04-09'

    # 取沪深300
    hs300 = ts.get_hs300s()
    hs300.code = hs300.code.apply(code_to_symbol)

    NewFactor = NewFactor(
        factor_name='NewFactor',
        factor_parameters={},
        tickers=hs300.code.tolist(),
        save_dir='',
        data_source=os.path.abspath('.'),
    )

    NewFactor.generate_factor_and_store(from_dt, to_dt)
    print('因子构建完成,并已成功入库!')

Note

The data_source can be designated when using Arctic or MongoDB.

Factors Set

Accruals2price
class factorset.factors.Accruals2price.Accruals2price(factor_name='Accruals2price', tickers='000016.SH', data_source='', factor_parameters={}, save_dir=None)[source]

Bases: factorset.factors.BaseFactor

Name:应计收入与价格比率(Accruals-to-price)
计算方法:应计收入与价格比率 =(净利润_TTM - 经营活动产生的现金流量金额_TTM) / 总市值
应用:若应计收入与价格比率较高,公司可能夸大销售额,未来可能产生亏损,损害股价。
注:O’Shaughnessy J P. What works on Wall Street: The classic guide to the best-performing investment strategies of all time[M]. McGraw Hill Professional, 2011.
generate_factor(trading_day)[source]

Note

必须实现generate_factor方法,用于计算因子并返回因子的值。

Parameters:trading_day – 循环交易日
prepare_data(begin_date, end_date)[source]

Note

必须实现prepare_data方法,用于一次性获取需要的数据。

Parameters:
  • from_date – 原始数据起始日
  • to_date – 原始数据结束日
AssetTurnover
class factorset.factors.AssetTurnover.AssetTurnover(factor_name='AssetTurnover', tickers='000016.SH', data_source='', factor_parameters={}, save_dir=None)[source]

Bases: factorset.factors.BaseFactor

名称:资产周转率
计算方法:营业收入_TTM / 资产总计_TTM,营业收入_TTM为最近4个季度报告期的营业收入之和,资产总计_TTM为最近5个季度报告期总资产的平均值。
应用:资产周转率越高,表明企业总资产周转速度越快。销售能力越强,资产利用效率越高。
generate_factor(date_str)[source]

Note

必须实现generate_factor方法,用于计算因子并返回因子的值。

Parameters:trading_day – 循环交易日
prepare_data(begin_date, end_date)[source]

Note

必须实现prepare_data方法,用于一次性获取需要的数据。

Parameters:
  • from_date – 原始数据起始日
  • to_date – 原始数据结束日
Beta
class factorset.factors.Beta.Beta(factor_name='Beta_60D', tickers='000016.SH', factor_parameters={'benchmark': '000300', 'lagTradeDays': 60}, data_source='', save_dir=None)[source]

Bases: factorset.factors.BaseFactor

名称:Beta系数
计算方法:取最近样本区间,分别计算指定证券日普通收益率Xi和沪深300日普通收益率Yi,OLS回归计算Beta。
应用:Beta系数是用来衡量两个时间序列之间关系的统计指标。在金融数据的分析中,Beta用来衡量个股相对于市场的风险。
generate_factor(end_day)[source]

Note

必须实现generate_factor方法,用于计算因子并返回因子的值。

Parameters:trading_day – 循环交易日
prepare_data(begin_date, end_date)[source]

数据预处理

CATurnover
class factorset.factors.CATurnover.CATurnover(factor_name='CATurnover', tickers='000016.SH', data_source='', factor_parameters={}, save_dir=None)[source]

Bases: factorset.factors.BaseFactor

名称:流动资产周转率
计算方法:流动资产周转率 = 营业收入_TTM / 流动资产总计_TTM,营业收入_TTM为最近4个季度报告期的营业收入之和,流动资产总计_TTM为最近5个季度报告期总资产的平均值。
应用:流动资产周转率越高,表明企业流动资产周转速度越快,利用越好。在较快的周转速度下,流动资产会相对节约,其意义相当于流动资产投入的扩大,在某种程度上增强了企业的创收能力。
generate_factor(end_day)[source]

逐日生成因子数据

Parameters:end_day – 因子生产的日期
Returns:ret – indx为ticker,value为因子值
Return type:pd.Series类型
prepare_data(begin_date, end_date)[source]

数据预处理

CurrentRatio
class factorset.factors.CurrentRatio.CurrentRatio(factor_name='CurrentRatio', tickers='000016.SH', data_source='', factor_parameters={}, save_dir=None)[source]

Bases: factorset.factors.BaseFactor

名称:流动比率(Current Ratio);营运资金比率(Working Capital Ratio);真实比率(Real Ratio)
计算方法:流动比率 = 流动资产合计_最新财报 / 流动负债合计_最新财报
应用:流动比率越高,说明资产的流动性越大,短期偿债能力越强。
generate_factor(date_str)[source]

Note

必须实现generate_factor方法,用于计算因子并返回因子的值。

Parameters:trading_day – 循环交易日
prepare_data(begin_date, end_date)[source]

Note

必须实现prepare_data方法,用于一次性获取需要的数据。

Parameters:
  • from_date – 原始数据起始日
  • to_date – 原始数据结束日
EP_LYR
class factorset.factors.EP_LYR.EP_LYR(factor_name='EP_LYR', tickers='000016.SH', data_source='', factor_parameters={}, save_dir=None)[source]

Bases: factorset.factors.BaseFactor

名称:静态市盈率的倒数;最近年报的市盈率的倒数
计算方法:EP_LYR=净利润(不含少数股东权益)_最新年报/总市值
应用:市盈率越低,代表投资者能够以相对较低价格购入股票。
注:这里用的是市值,而不是价格,这种方法适合计算日度的数据,防止除权除息带来的价格扰动。
generate_factor(trading_day)[source]

Note

必须实现generate_factor方法,用于计算因子并返回因子的值。

Parameters:trading_day – 循环交易日
prepare_data(begin_date, end_date)[source]

Note

必须实现prepare_data方法,用于一次性获取需要的数据。

Parameters:
  • from_date – 原始数据起始日
  • to_date – 原始数据结束日
EP_TTM
class factorset.factors.EP_TTM.EP_TTM(factor_name='EP_TTM', tickers='000016.SH', data_source='', factor_parameters={}, save_dir=None)[source]

Bases: factorset.factors.BaseFactor

名称:过去滚动4个季度(12月)市盈率的倒数
计算方法:EP_TTM = 净利润(不含少数股东权益)_TTM /总市值
应用:市盈率越低,代表投资者能够以相对较低价格购入股票。
generate_factor(trading_day)[source]

Note

必须实现generate_factor方法,用于计算因子并返回因子的值。

Parameters:trading_day – 循环交易日
prepare_data(begin_date, end_date)[source]

数据预处理

GPOA
class factorset.factors.GPOA.GPOA(factor_name='GPOA', tickers='000016.SH', data_source='', factor_parameters={}, save_dir=None)[source]

Bases: factorset.factors.BaseFactor

名称:毛利率;毛利比总资产
计算方法:毛利率 = (营业收入 - 营业成本) / 总资产
应用:毛利率可以反应企业的盈利能力,是一个商品经过生产转换内部系统以后增值。
generate_factor(date_str)[source]

Note

必须实现generate_factor方法,用于计算因子并返回因子的值。

Parameters:trading_day – 循环交易日
prepare_data(begin_date, end_date)[source]

Note

必须实现prepare_data方法,用于一次性获取需要的数据。

Parameters:
  • from_date – 原始数据起始日
  • to_date – 原始数据结束日
GrossMarginTTM
class factorset.factors.GrossMarginTTM.GrossMarginTTM(factor_name='GrossMarginTTM', tickers='000016.SH', data_source='', factor_parameters={}, save_dir=None)[source]

Bases: factorset.factors.BaseFactor

名称:毛利率;销售毛利率
计算方法:=(营业收入_TTM - 营业成本_TTM)/ 营业收入_TTM,营业收入_TTM为最近4个季度报告期的营业收入之和,营业成本_TTM为最近4个季度报告期的营业成本之和。
应用:毛利率越高表明企业的盈利能力越强,控制成本的能力越强。但是对于不同规模和行业的企业,毛利率的可比性不强。
generate_factor(date_str)[source]

Note

必须实现generate_factor方法,用于计算因子并返回因子的值。

Parameters:trading_day – 循环交易日
prepare_data(begin_date, end_date)[source]

Note

必须实现prepare_data方法,用于一次性获取需要的数据。

Parameters:
  • from_date – 原始数据起始日
  • to_date – 原始数据结束日
InterestCover
class factorset.factors.InterestCover.InterestCover(factor_name='InterestCover', tickers='000016.SH', data_source='', factor_parameters={}, save_dir=None)[source]

Bases: factorset.factors.BaseFactor

名称:利息覆盖率;利息覆盖倍数(InterestCover)
计算方法:EBIT / 利息费用,其中 EBIT=利润总额+净利息费用;净利息费用=利息支出-利息收入,若未披露财务费用附注,则直接取财务费用值
应用:利息覆盖率可以衡量企业的偿债能力,特别是在公司经历业绩低谷,自由现金流脆弱的时期更为关键,它可以说明公司是否还有能力支付利息以避免偿债风险,以及是否还有融资能力来扭转困境。
generate_factor(date_str)[source]

Note

必须实现generate_factor方法,用于计算因子并返回因子的值。

Parameters:trading_day – 循环交易日
prepare_data(begin_date, end_date)[source]

Note

必须实现prepare_data方法,用于一次性获取需要的数据。

Parameters:
  • from_date – 原始数据起始日
  • to_date – 原始数据结束日
LDebt2TA
class factorset.factors.LDebt2TA.LDebt2TA(factor_name='LDebt2TA', tickers='000016.SH', data_source='', factor_parameters={}, save_dir=None)[source]

Bases: factorset.factors.BaseFactor

名称:长期负债比率(Long-term liability rate)
计算方法:长期负债比 = 长期负债 / 资产总额
应用:长期负债比率越小,表明公司负债的资本化程度低,长期偿债压力小;反之,则表明公司负债的资本化程度高,长期偿债压力大。
generate_factor(end_day)[source]

逐日生成因子数据

Parameters:end_day – 因子生产的日期
Returns:ret – indx为ticker,value为因子值
Return type:pd.Series类型
prepare_data(begin_date, end_date)[source]

数据预处理

Momentum
class factorset.factors.Momentum.Momentum(factor_name='momentum_60D', tickers='000016.SH', factor_parameters={'lagTradeDays': 60}, data_source='', save_dir=None)[source]

Bases: factorset.factors.BaseFactor

名称:动量因子,股票收益率
计算方法:该指标的值等于最近三个月的股票收益率,利用当日和之前第252个交易日的复权价计算收益率,公式如下: Momentum_3M=(dajclose_price(t)/ dajclose_price(t-63)-1)
generate_factor(end_day)[source]

计算增量因子数据 :param end_day: 因子生产的日期 :return: pd.series,index为ticker,value为因子值

prepare_data(begin_date, end_date)[source]

制作因子的数据准备 :param begin_date: :param end_date: :return:

NATurnover
class factorset.factors.NATurnover.NATurnover(factor_name='NATurnover', tickers='000016.SH', data_source='', factor_parameters={}, save_dir=None)[source]

Bases: factorset.factors.BaseFactor

名称:净资产周转率
计算方法:NATurnover = revenue_TTM / netAsset_TTM,净资产周转率 = 营业收入_TTM / 净资产总计_TTM,营业收入_TTM为最近4个季度报告期的营业收入之和,净资产总计_TTM为最近5个季度报告期总资产的平均值。
应用:资产周转率越高,表明企业总资产周转速度越快。销售能力越强,资产利用效率越高。
generate_factor(end_day)[source]

Note

必须实现generate_factor方法,用于计算因子并返回因子的值。

Parameters:trading_day – 循环交易日
prepare_data(begin_date, end_date)[source]

数据预处理

QuickRatio
class factorset.factors.QuickRatio.QuickRatio(factor_name='QuickRatio', tickers='000016.SH', data_source='', factor_parameters={}, save_dir=None)[source]

Bases: factorset.factors.BaseFactor

名称:速动比率(Quick Ratio);酸性测验比率(Acid-test Ratio)
计算方法:速动比率 = 速动资产合计_最新财报 / 流动负债合计_最新财报;速动资产=流动资产-存货=流动资产-存货-预付账款-待摊费用
应用:速动比率是衡量企业流动资产中可以立即变现用于偿还流动负债的能力。速动资产包括货币资金、短期投资、应收票据、应收账款、其他应收款项等,可以在较短时间内变现。
generate_factor(date_str)[source]

Note

必须实现generate_factor方法,用于计算因子并返回因子的值。

Parameters:trading_day – 循环交易日
prepare_data(begin_date, end_date)[source]

Note

必须实现prepare_data方法,用于一次性获取需要的数据。

Parameters:
  • from_date – 原始数据起始日
  • to_date – 原始数据结束日
ROIC
class factorset.factors.ROIC.ROIC(factor_name='ROIC', tickers='000016.SH', data_source='', factor_parameters={}, save_dir=None)[source]

Bases: factorset.factors.BaseFactor

名称:投资资本回报率
计算方法:投资资本回报率 = (净利润(不含少数股东权益)_TTM + 财务费用 _TTM)/ 投资资本_TTM,投资资本 = 资产总计 - 流动负债 + 应付票据 + 短期借款 + 一年内到期的长期负债,净利润_TTM为最近4个季度报告期的净利润之和,投资资本_TTM为最近5个季度报告期总资产的平均值。
应用:一般而言,资本回报率较高表明公司强健或者管理有方。但同时,也可能管理者过分强调营收,忽略成长机会,牺牲长期价值。
generate_factor(end_day)[source]

Note

必须实现generate_factor方法,用于计算因子并返回因子的值。

Parameters:trading_day – 循环交易日
prepare_data(begin_date, end_date)[source]

Note

必须实现prepare_data方法,用于一次性获取需要的数据。

Parameters:
  • from_date – 原始数据起始日
  • to_date – 原始数据结束日
RoeGrowth1
class factorset.factors.RoeGrowth1.RoeGrowth1(factor_name='RoeGrowth1', tickers='000016.SH', data_source='', factor_parameters={}, save_dir=None)[source]

Bases: factorset.factors.BaseFactor

名称:ROE(TTM)增长(上一季度)
计算方法:ROE增长 = 本季度ROE(TTM) - 上一季度ROE(TTM)
generate_factor(end_day)[source]

Note

必须实现generate_factor方法,用于计算因子并返回因子的值。

Parameters:trading_day – 循环交易日
prepare_data(begin_date, end_date)[source]

Note

必须实现prepare_data方法,用于一次性获取需要的数据。

Parameters:
  • from_date – 原始数据起始日
  • to_date – 原始数据结束日
RoeGrowth2
class factorset.factors.RoeGrowth2.RoeGrowth2(factor_name='RoeGrowth2', tickers='000016.SH', data_source='', factor_parameters={}, save_dir=None)[source]

Bases: factorset.factors.BaseFactor

名称:ROE(TTM)增长(去年同期)
计算方法:ROE增长 = 当期ROE(TTM) - 去年同期ROE(TTM)
generate_factor(end_day)[source]

Note

必须实现generate_factor方法,用于计算因子并返回因子的值。

Parameters:trading_day – 循环交易日
prepare_data(begin_date, end_date)[source]

Note

必须实现prepare_data方法,用于一次性获取需要的数据。

Parameters:
  • from_date – 原始数据起始日
  • to_date – 原始数据结束日
TA2TL
class factorset.factors.TA2TL.TA2TL(factor_name='TA2TL', tickers='000016.SH', data_source='', factor_parameters={}, save_dir=None)[source]

Bases: factorset.factors.BaseFactor

名称:资产负债比(百度百科),资产负债率(Debt Assets ratio)的倒数
计算方法:资产负债比 = 资产总额 / 负债总额
应用:资产所占的比重越大,企业的经营风险就比较低,但相对来说,企业的资金利用就不是太有效率。
generate_factor(end_day)[source]

逐日生成因子数据

Parameters:end_day – 因子生产的日期
Returns:ret – indx为ticker,value为因子值
Return type:pd.Series类型
prepare_data(begin_date, end_date)[source]

数据预处理

UnleverBeta
class factorset.factors.UnleverBeta.UnleverBeta(factor_name='UnleverBeta_60D', tickers='000016.SH', factor_parameters={'benchmark': '000300', 'lagTradeDays': 60}, data_source='', save_dir=None)[source]

Bases: factorset.factors.BaseFactor

名称:UnleverBeta因子,剔除财务杠杆比率的Beta(账面价值比)
计算方法:UnleverBeta = Beta / (1 + 总负债 / 股东权益)
应用:Unlevered beta本意是作为一个实际beta估算的中间值,排除资本结构差异的影响。无财务杠杆的企业只有经营风险,没有财务风险,无财务杠杆的贝塔系数是企业经营风险的衡量,该贝塔系数越大,企业经营风险就越大,投资者要求的投资回报率就越大,市盈率就越低。
generate_factor(end_day)[source]

Note

必须实现generate_factor方法,用于计算因子并返回因子的值。

Parameters:trading_day – 循环交易日
prepare_data(begin_date, end_date)[source]

数据预处理

API Reference

Data Fetchers

Stock Data
factorset.Run.data_fetch.data_fetch()[source]

从config中读取配置,爬取行情,基本面,及其他数据。

factorset.data.StockSaver.write_all_stock(allAshare, lib=None)[source]
Parameters:
  • allAshare – List,所有股票
  • lib – arctic.store.version_store.VersionStore
Returns:

succ: List, written stocks; fail: List, failed written stocks

factorset.data.StockSaver.save_index(symbol)[source]

从Tushare取指数行情,000905 中证500,000300 沪深300

Parameters:symbol – 指数的代码
Other Data
factorset.data.OtherData.market_value(dir, tickers)[source]

总市值的读取与计算

Parameters:
  • dir – str, 其他数据的储存路径
  • tickers – list, 股票代码
Returns:

pd.DataFrame: 总市值

factorset.data.OtherData.tradecal(startday=None, endday=None)[source]

交易日历

Parameters:
  • startday – 默认为‘2017-06-15’
  • endday – 默认为最近交易日
Returns:

list,交易日历

Fundamental Data
class factorset.data.FundCrawler.FundCrawler(TYPE)[source]

FundCrawler类,协程爬取基本面数据

consume(queue)[source]

消费直到任务结束

Parameters:queue – ticker 队列
Returns:None
data_clean(text)[source]

text数据清洗

Parameters:text – 协程爬取的text数据
Returns:pd.DataFrame
fetch(queue, session, url, ticker)[source]

单个ticker基本面爬取

Parameters:
  • queue – ticker 队列
  • session – aiohttp.ClientSession()
  • url – 股票基本面爬取地址
  • ticker – 股票代码
Returns:

基本面数据text

get_proxy()[source]

获取proxy

Returns:proxy地址
main(Ashare, num=10, retry=2)[source]

协程爬取主程序

Parameters:
  • Ashare (list) – 带爬取tickers
  • num (int) – 最大协程数
  • retry (int) – 重启次数
Returns:

None

run(queue, max_tasks)[source]

Schedule the consumer

Parameters:
  • queue – ticker 队列
  • max_tasks – 最大协程数
Returns:

None

write_to_MongoDB(symbol, df, source='Tushare')[source]
Parameters:
  • symbol – ticker
  • df – 单个ticker基本面数据,pd.DataFrame
  • source – 注释表明来源,str,默认为’Tushare’
Returns:

None

Data Reader

The following methods are available for use in the prepare_data (recommended), generate_factor API functions.

Stock Data
factorset.data.CSVParser.all_stock_symbol(dir)[source]
Parameters:dir (string) – 数据路径
Returns:路径下所有股票tickers
factorset.data.CSVParser.read_stock(dir, ticker)[source]
Parameters:
  • dir (string) – 数据路径
  • ticker – 单个股票ticker
Returns:

单个股票行情, pd.DataFrame

factorset.data.CSVParser.concat_stock(dir, tickers)[source]
纵向合并目录指定股票行情
Parameters:
  • dir (string) – 数据路径
  • tickers – 股票tickers, list
Return type:

pd.DataFrame

factorset.data.CSVParser.concat_all_stock(dir)[source]

纵向合并目录所有股票行情

Parameters:dir (string) – 数据路径
Returns:pd.DataFrame
factorset.data.CSVParser.hconcat_stock_series(hq, tickers)[source]

横向合并股票行情

Parameters:
  • hq (pd.DataFrame) – concat_all_stock后的DataFrame
  • tickers (list) – 股票tickers, list
Return type:

pd.DataFrame

Other Data
factorset.data.OtherData.write_new_stocks()[source]

从Tushare取每日新股数据,因Tushare数据限制,最多取到2016-04-26

Returns:None
factorset.data.OtherData.write_all_date(tc, lib=None)[source]
Parameters:
  • tc – List,所有日期
  • lib – arctic.store.version_store.VersionStore
Returns:

succ: List, written stocks;

Returns:

fail: List, failed written stocks

Fundamental Data
factorset.data.CSVParser.all_fund_symbol(dir, type)[source]

获取储存路径中一种报表的所有tickers

Parameters:
  • dir (string) – 数据路径
  • type – BS’,’IS’,’CF’
Returns:

tickers

Return type:

list

factorset.data.CSVParser.read_fund(dir, type, ticker)[source]

读取一个股票的一种报表数据

Parameters:
  • dir (string) – 数据路径,string
  • type – BS’,’IS’,’CF’
  • ticker – 股票ticker, str
Return type:

pd.DataFrame

factorset.data.CSVParser.fund_collist(dir, type)[source]

一种报表所有股票的会计项目

Parameters:
  • dir (string) – 数据路径
  • type – BS’,’IS’,’CF’
Return type:

list

factorset.data.CSVParser.concat_fund(dir, tickers, type)[source]

纵向合并一种财务报表

Parameters:
  • dir (string) – 数据路径
  • tickers – 股票tickers, list
  • type – BS’,’IS’,’CF’
Return type:

pd.DataFrame

Data Util

factorset.data.OtherData.code_to_symbol(code)[source]

生成symbol代码标志

Parameters:code – 数字
Returns:str,股票代码
factorset.data.OtherData.shift_date(date_str, n)[source]
Parameters:
  • date_str – 日期, ‘YYYYMMDD’格式的字符串
  • n – 时间跨度, int
Returns:

调整后的交易日,date

factorset.Util.finance.ttmContinues(report_df, label)[source]

Compute Trailing Twelve Months for multiple indicator.

computation rules:
  1. ttm indicator is computed on announcement date.
  2. on given release_date, use the latest report_date and the previous report year for computation.
  3. if any report period is missing, use weighted method.
  4. if two reports (usually first-quoter and annual) are released together, only keep latest
Parameters:
  • report_df (Pandas.DataFrame) – must have ‘report_date’, ‘release_date’, and <label> columns
  • label (str.) – column name for intended indicator
Returns:

columned by [‘datetime’, ‘report_date’, <label>+’_TTM’, …]

Return type:

Pandas.DataFrame

Todo

if announce_date exist, use announce_date instead of release_date, report_date as well

factorset.Util.finance.ttmDiscrete(report_df, label_str, min_report_num=4)[source]
Parameters:
  • report_df (Pandas.DataFrame) – must have ‘report_date’, ‘release_date’, and <label> columns
  • label_str
  • min_report_num (int) –
Returns:

columned by [‘datetime’, ‘report_date’, <label>+’_TTM’, …]

Return type:

pd.DataFrame

Contributing

Contributions are welcome, and they are greatly appreciated! Every little bit helps, and credit will always be given.

You can contribute in many ways:

Types of Contributions

Report Bugs

Report bugs at https://github.com/quantasset/factorset/issues.

If you are reporting a bug, please include:

  • Your operating system name and version.
  • Any details about your local setup that might be helpful in troubleshooting.
  • Detailed steps to reproduce the bug.
Fix Bugs

Look through the GitHub issues for bugs. Anything tagged with “bug” and “help wanted” is open to whoever wants to implement it.

Implement Features

Look through the GitHub issues for features. Anything tagged with “enhancement” and “help wanted” is open to whoever wants to implement it.

Write Documentation

factorset could always use more documentation, whether as part of the official factorset docs, in docstrings, or even on the web in blog posts, articles, and such.

Submit Feedback

The best way to send feedback is to file an issue at https://github.com/quantasset/factorset/issues.

If you are proposing a feature:

  • Explain in detail how it would work.
  • Keep the scope as narrow as possible, to make it easier to implement.
  • Remember that this is a volunteer-driven project, and that contributions are welcome :)

Get Started!

Ready to contribute? Here’s how to set up factorset for local development.

  1. Fork the factorset repo on GitHub.

  2. Clone your fork locally:

    $ git clone git@github.com:your_name_here/factorset.git
    
  3. Install your local copy into a virtualenv. Assuming you have virtualenvwrapper installed, this is how you set up your fork for local development:

    $ mkvirtualenv factorset
    $ cd factorset/
    $ python setup.py develop
    
  4. Create a branch for local development:

    $ git checkout -b name-of-your-bugfix-or-feature
    

    Now you can make your changes locally.

  5. When you’re done making changes, check that your changes pass flake8 and the tests, including testing other Python versions with tox:

    $ flake8 factorset tests
    $ python setup.py test or py.test
    $ tox
    

    To get flake8 and tox, just pip install them into your virtualenv.

  6. Commit your changes and push your branch to GitHub:

    $ git add .
    $ git commit -m "Your detailed description of your changes."
    $ git push origin name-of-your-bugfix-or-feature
    
  7. Submit a pull request through the GitHub website.

Pull Request Guidelines

Before you submit a pull request, check that it meets these guidelines:

  1. The pull request should include tests.
  2. If the pull request adds functionality, the docs should be updated. Put your new functionality into a function with a docstring, and add the feature to the list in README.rst.

Tips

To run a subset of tests:

$ py.test tests.test_factorset

Deploying

A reminder for the maintainers on how to deploy. Make sure all your changes are committed (including an entry in HISTORY.rst). Then run:

$ bumpversion patch # possible: major / minor / patch
$ git push
$ git push --tags

Travis will then deploy to PyPI if tests pass.

Credits

Development Lead

Contributors

None yet. Why not be the first?

History

0.0.2 (2018-05-04)

  • Add Factors module with 20 sample factors.
  • Imporved data collection module.

0.0.1 (2018-04-19)

  • First release on PyPI.