Python 和数据科学

Note

这里是 Scott 在学习数据科学所写的学习笔记,欢迎交流与指教。

目录:

WiFi Modules

In this part, we will provide the user manuals for the ESP WiFi modules;

ESP-F

Date:2018-01-26

This section provides the user manual for ESP-F, which can be ordered at the our shop: www.vvdoit.com

Version updation
Date Version Content
3-14-2017 V1.0 Initited
3-18-2017 V1.1 Add the rececommanded PCB design
9-11-2018 V1.2 optimize the Minimum system
Introduction

The WiFi module ESP-F is manufactured by using a high-performance chip ESP8266EX, which can be seen in Fig. 1. This small chip is encapsulated an enhanced Tensilica’sL106 diamond series 32-bit kennel CPU with a SRAM. Thus, ESP8266 has the complete function Wi-Fi function; it not only can be applied independently, but can be used as a slaver working with other host CPU. When ESP8266 is applied as a slaver, it can start from the onboard Flash. The built-in high-speed buffer is not only benefit to improve the system performance, but optimize the store system. In addition, ESP8266 can be used as Wi-Fi adapter by SPI/SDIO or I2C/UART interface, when it is applied to other MCU design.

ESP-F Module Structure

ESP-F Module Structure

The ESP-F module supports the standard IEEE802.11 b/g/n/e/i protocol and the complete TCP/IP protocol stack. User can use it to add the WiFi function for the installed devices, and also can be viewed as a independent network controller. Anyway, ESP-F module provides many probabilities with the best price.

How to Order

Please click this link: ESP-F;

Features

How to Order;

SOC characteristics
  • Built-in Tensilica L106 ultra-low power consumption 32-bit cpu, the main frequency can be 80MHz and 160MHz, also support RTOS;
  • Built-in TCP/IP protocol stack;
  • Built-in 1 channel 10-bit high precision ADC;
  • The outside interfaces have HSPI, UART, I2C, I2S, IR Remote Control, PWM, GPIO;
  • The deep-sleep current is about 10uA, and the cut-off current is smaller than 5uA;
  • Can be wake-up within 2 ms, and connect to transmit data package;
  • the consume power is smaller than 1.0mW (DTIM3) when at standby status;
Wi-Fi characteristics
  • Support 802.11 b/g/n/e/i
  • Support three modes: Station, SoftAP, and SoftAP+STA;
  • SupportWi-Fi Direct(P2P);
  • Support hardware acceleration for CCMP (CBC-MAC, computation mode), TKIP (MIC, RC4), WAPI(SMS4), WEP(RC4), CRC;
  • P2P find, P2P GO mode/GC mode and P2P power management;
  • WPA/PA2 PSK and WPS;
  • Support 802.11 i security: pre-certification and TSN;
  • Support 802.11n (2.4 GHz);
  • 802.1h/RFC1042 frame encapsulation;
  • Support seamless roam;
  • Support AT remote updation and cloud OTA updation;
  • Support SmartConfig function for Android and iOS device SmartConfig.
Peripheral for Module
  • 2*UART;
  • 1*En;
  • 1*ADC
  • 1*wakeup pin
  • 1*HSPI
  • 1*I2C
  • 1*I2S
  • 4M byte Flash
  • MAX 11* GPIOs;
  • Working temperature: -40℃-85℃
  • Module size: 16mm*24mm;
Applications
  • Serial Transparent transmission;
  • Smart power plug/Smart LED light;
  • Sensor networks;
  • Wearable electronics;
  • Securit ID label;
  • Wireless location recognition;
  • Wireless location system beacon;
  • WiFi prober;
  • Mesh networks;
  • Industrial wireless control.
Module Type
Name Antenna Type
ESP-F PCB on board antenna
Parameters

Parameters for ESP-F are listed as follows.

Types items Parameters
WiFi Frequency Scope 2.4G~2.5G(2400M~2483.5M)
Transmit Power 802.11b: +20 dBm
802.11g: +17 dBm
802.11n: +14 dBm
Receiving sensitivity 802.11b: -91 dbm (11Mbps)
802.11g: -75 dbm(54Mbps)
802.11n: -72 dbm(MCS7)
Antenna PCB onboard antenna
Hardware CPU Tensilica L106 32 bit MCU
Perpherl UART/SDIO/SPI/I2C/I2S/IR control
GPIO/ADC/PWM/SPI/I2C/I2S
Working voltage 2.5V ~ 3.6V
Working current Average current: 80 mA
Working temperature -40°C ~85°C
Environment temperature -40°C ~ 85°C
Size 16mm x 24mm x 3mm
Software Wi-Fi mode Station/SoftAP/SoftAP+Station
Security mode WPA/WPA2
Encryption type WEP/TKIP/AES
Update firmware UART Download/OTA (by internet)
Software develop Non-RTOS/RTOS/Arduino IDE etc.
Network protocol IPv4, TCP/UDP/HTTP/FTP/MQTT
User configuration AT+command/cloud sever/Android/iOS APP

PINs Definition

PINs definition of ESP-F can be shown in the following.

Pins definition for ESP-F Module

Pins definition for ESP-F Module

Selection of Working Mode
Working mode and definition of pins:
Mode GPIO15 GPIO0 GPIO2
UART download low low high
FlashBoot mode low high high
Function Definition of Module Pins
Num Pin Type Function
1 RST I Reset the signal outside (enable with low), Reset module
2 ADC I A/D pin. Input voltage 0~1V, value: 0~1024
3 EN I high level:chip work;low level:chip closes with small current.
4 IO16 I/O deep sleep/wakeup
5 IO14 I/O GPIO14; HSPI_CLK
6 IO12 I/O GPIO12;HSPI_MISO
7 IO13 I/O GPIO13;HSPI_MOSI;UART0_CTS
8 VCC P Module working voltage: 3.3V
9 CS0 I/O GPIO11; SD_CMD; SPI_CS0
10 MISO I/O GPIO7; SD_D0, SPI_MSIO
11 IO9 I/O GPIO9; SD_D2 PIHD; HSPIHD
12 IO10 I/O GPIO10; SD_D3;SPIWP; HSPIWP1
13 MOSI I/O GPIO8; SD_D1;SPI_MOSI1
14 SCLK I/O GPIO6; SD_CLK; SPI_CLK
15 GND P GND
16 IO15 I/O GPIO15; MTDO;HSPICS;UART0_RTS
17 IO2 I/O GPIO2; UART1_TXD
18 IO0 I/O GPIO0;SPI_CS2
19 IO4 I/O GPIO4
20 IO5 I/O GPIO5
21 RXD I/O GPIO3; used to build in Flash as UART Rx
22 TXD I/O GPIO1; used to build in Flash as UART Tx
Shape and Size

Shape and size for this module can be shown as follows. Its size is 16mm*24mm*3mm, and the Flash is 4M bytes (32Mbits), together with the following picture.

ESP-F Module

ESP-F Module

Length Width Height Pin Distance between pins
24.5mm 14mm 3mm 4x2 2.54mm
Electronical Characteristics

Please refers to the following table.

Parameters Condition Min Classical Max Unite
Store Temperature N/A -40 Normal 125 degree
Sold Temperature IPC/JEDEC J-STD-020 N/A N/A 260 degree
Working Voltage N/A 2.5 3.3 3.6 V
I/O VIL/VIH N/A -0.3/0.75VIO N/A 0.25VIO/3.6 V
VOL/VOH N/A N/0.8VIO N/A 0.1VIO/N V
IMAX N/A N/A N/A 12 mA
Electrostatic(BODY) TAMB=25 N/A N/A 2 kV
Electrostatic(BODY) TAMB=25 N/A N/A 0.5 kV

Please refer to the following table.

Power Consumption
Parameters Min Classical Max Unite
Tx802.11b, CCK 11Mbps, POUT=+17dBm N/A 170 N/A mA
Tx802.11g, OFDM 54 Mbps, POUT =+15dBm N/A 140 N/A mA
Tx802.11n,MCS7,POUT =+13dBm N/A 120 N/A mA
Rx 802.11b,1024 Bytes, -80dBm N/A 50 N/A mA
Rx 802.11g,1024 Bytes, -70dBm N/A 56 N/A mA
Rx 802.11n,1024 Bytes, -65dBm N/A 56 N/A mA
Modem-sleep① N/A 15 N/A mA
Light-sleep② N/A 0.9 N/A mA
Deep-sleep③ N/A 20 N/A mA
close N/A 0.5 N/A mA

① Modem-Sleep mode can be used for the case that CPU is always working, e.g., PWM or I2S etc. If WiFi is connected and no data is to transmitted, in this case, WiFi modem can be closed to save power energy. For example, if at DTIM3 status, keep asleep at 300ms, Then, the module can wake up to receive the Beacon package within 3ms and the current being 15mA.

② Light-Sleep mode can used for the case that CUP can stop the application temporally, e.g., Wi-Fi Switch . If Wi-Fi is connected and there is no data packet to transmitted, by the 802.11 standard (e.g., U-APSD), module can close Wi-Fi Modem and stop CPU to save power. For example, at DTIM3, keep up sleeping at 300ms, it would receive the Beacon package from AP after each 3ms, then the whole average current is about 0.9mA.

③ Deep-Sleep mode is applied to the case that Wi-Fi is not necessary to connect all the time, just send a data packet after a long time (e.g., transmit one temperate data each 100s) . it just need 0.3s-1s to connect AP after each 300s, and the whole average current is much smaller 1mA.

Wi-Fi RF Characteristics

The data in the following Table is gotten when voltage is 3.3V and1.1V in the indoor temperature environment.

At 72.2Mbps, output power consumption for PA Min Classical Max unite
Input frequencey 2412 N/A 2484 MHz
Input impedance N/A 50 N/A Ω
Input reflection N/A N/A -10 dB
At 72.2Mbps, output power consumption for PA 15.5 16.5 17.5 dBm
At 11b mode, output power consumption for PA 19.5 20.5 21.5 dBm
Sensibility N/A N/A N/A N/A
DSSS, 1Mbps N/A -98 N/A dBm
CCK11, Mbps N/A -91 N/A dBm
6Mbps(1/2 BPSK) N/A -93 N/A dBm
54Mbps(3/4 64-QAM) N/A -75 N/A dBm
HT20, MCS7(65 Mbps, 72.2 Mbps) N/A -72 N/A dBm
Adjacent Inhibition        
OFDM, 6Mbps N/A 37 N/A dB
OFDM, 54Mbps N/A 21 N/A dB
HT20, MCS0 N/A 37 N/A dB
HT20, MCS7 N/A 22 N/A dB
Minimum System
Minimum for ESP-F

Minimum for ESP-F

  1. the working voltage for module is DC 3.3V;
  2. the max current from IO of this module is 12mA;
  3. RST Pin is enabled when it is low level; and EN pin is enabled when it is high level;
  4. WiFi module is at update mode: GPIO0 is low level, then module reset to power; Wi-Fi module is at working mode: GPIO0 is at high level, and then reset to power;
  5. Wi-Fi module is connected to RXD of the other MCU, and TXD is connected to RXD of the other MCU.
Contact Us

ESP-S

ESP-M1/M2

The sell information can be visited at our link: M1: www.vvdoit.com, and M2 www.vvdoit.com

Introduction

The WiFi module ESP-M is manufactured by using a high-performance chip ESP8285. This small chip is encapsulated an enhanced Tensilica’sL106 diamond series 32-bit kennel CPU with a SRAM. Thus, ESP8285 has the complete function Wi-Fi function; it not only can be applied independently, but can be used as a slaver working with other host CPU. When ESP8285 is applied as a slaver, it can start from the onboard Flash. The built-in high-speed buffer is not only benefit to improve the system performance, but optimize the store system. In addition, ESP8285 can be used as Wi-Fi adapter by SPI/SDIO or I2C/UART interface, when it is applied to other MCU design.

The ESP-M module supports the standard IEEE802.11 b/g/n/e/i protocol and the complete TCP/IP protocol stack. User can use it to add the WiFi function for the installed devices, and also can be viewed as a independent network controller. Anyway, ESP-M module provides many probabilities with the best price.

ESP8285-M Module Structure
Features
SOC characteristics
  • Built-in Tensilica L106 ultra-low power consumption 32-bit cpu, the main frequency can be 80MHz and 160MHz, also support RTOS;
  • Built-in TCP/IP protocol stack;
  • Built-in 1 channel 10-bit high precision ADC;
  • The outside interfaces have HSPI, UART, I2C, I2S, IR Remote Control, PWM, GPIO;
  • The deep-sleep current is about 10uA, and the cut-off current is smaller than 5uA;
  • Can be wake-up within 2 ms, and connect to transmit data package;
  • the consume power is smaller than 1.0mW (DTIM3) when at standby status;
  • built-in 1M byte for SPI Flash.
Wi-Fi characteristics
  • Support 802.11 b/g/n/e/i
  • Support three modes: Station, SoftAP, and SoftAP+STA;
  • SupportWi-Fi Direct(P2P);
  • Support hardware acceleration for CCMP (CBC-MAC, computation mode), TKIP (MIC, RC4), WAPI(SMS4), WEP(RC4), CRC;
  • P2P find, P2P GO mode/GC mode and P2P power management;
  • WPA/PA2 PSK and WPS;
  • Support 802.11 i security: pre-certification and TSN;
  • Support 802.11n (2.4 GHz);
  • 802.1h/RFC1042 frame encapsulation;
  • Support seamless roam;
  • Support AT remote updation and cloud OTA updation;
  • Support SmartConfig function for Android and iOS device SmartConfig.
Peripheral for Module
  • 2*UART;
  • 1*En;
  • 1*ADC;
  • 1*wakeup pin;
  • 1*HSPI;
  • 1*I2C;
  • 1*I2S;
  • MAX 10* GPIOs;
  • Working temperature: -40℃-125℃

Module size: * 12.3*mm*15mm; (M1 version) * 12.3*mm*20mm; (M2 version)

Application
  • Serial Transparent transmission;
  • Smart power plug/Smart LED light;
  • Sensor networks;
  • Wearable electronics;
  • Securit ID label;
  • Wireless location recognition;
  • Wireless location system beacon;
  • WiFi prober;
  • Mesh networks;
  • Industrial wireless control.
Module Type
Name Antenna Type
ESP-M1 IPEX external antenna
ESP-M2 PCB on board antenna
Paramters
Types items Parameters
WiFi Frequency Scope 2.4G~2.5G(2400M~2483.5M)
Transmit Power 802.11b: +20 dBm
802.11g: +17 dBm
802.11n: +14 dBm
Receiving sensitivity 802.11b: -91 dbm (11Mbps)
802.11g: -75 dbm(54Mbps)
802.11n: -72 dbm(MCS7)
Antenna PCB onboard antenna
Hardware CPU Tensilica L106 32 bit MCU
Perpherl UART/SDIO/SPI/I2C/I2S/IR control
GPIO/ADC/PWM/SPI/I2C/I2S
Working voltage 2.5V ~ 3.6V
Working current Average current: 80 mA
Working temperature -40°C ~85°C
Environment temperature -40°C ~ 85°C
Size 12mm x 15mm x 3mm
Software Wi-Fi mode Station/SoftAP/SoftAP+Station
Security mode WPA/WPA2
Encryption type WEP/TKIP/AES
Update firmware UART Download/OTA (by internet)
Software develop Non-RTOS/RTOS/Arduino IDE etc.
Network protocol IPv4, TCP/UDP/HTTP/FTP/MQTT
User configuration AT+command/cloud sever/Android/iOS APP

PINs Definition

PINs definition of ESP-F can be shown in the following.

Pins definition for ESP-M1 Module

Pins definition for ESP-M1 Module

Pins definition for ESP-M2 Module

Pins definition for ESP-M2 Module

Selection of Working Mode
Working mode and definition of pins:
Mode GPIO15(connected registance) GPIO0 GPIO1
UART download low low high
Flash Boot mode low high high
Function Definition of Module Pins
Shape and Size

Shape and size for this module can be shown as follows. Its size is 16mm*24mm*3mm, and the Flash is 4M bytes (32Mbits), together with the following picture.

ESP-M1 Module

ESP-M1 Module

ESP-M2 Module

ESP-M2 Module

Size of ESP-M1 module
Length Width Height PAD Size(bottom) Distance between pins
12.3mm 15mm 3mm 0.9*1.7mm 1.5mm
Size of ESP-M2 module
Length Width Height PAD Size(bottom) Distance between pins
12.3mm 20mm 3mm 0.9*1.7mm 1.5mm
Electronical Characteristics

Please refers to the following table.

Parameters Condition Min Classical Max Unite
Store Temperature N/A -40 Normal 125 degree
Sold Temperature IPC/JEDEC J-STD-020 N/A N/A 260 degree
Working Voltage N/A 2.5 3.3 3.6 V
I/O VIL/VIH N/A -0.3/0.75VIO N/A 0.25VIO/3.6 V
VOL/VOH N/A N/0.8VIO N/A 0.1VIO/N V
IMAX N/A N/A N/A 12 mA
Electrostatic(BODY) TAMB=25 N/A N/A 2 kV
Electrostatic(BODY) TAMB=25 N/A N/A 0.5 kV

Please refer to the following table.

Power Consumption
Parameters Min Classical Max Unite
Tx802.11b, CCK 11Mbps, POUT=+17dBm N/A 170 N/A mA
Tx802.11g, OFDM 54 Mbps, POUT =+15dBm N/A 140 N/A mA
Tx802.11n,MCS7,POUT =+13dBm N/A 120 N/A mA
Rx 802.11b,1024 Bytes, -80dBm N/A 50 N/A mA
Rx 802.11g,1024 Bytes, -70dBm N/A 56 N/A mA
Rx 802.11n,1024 Bytes, -65dBm N/A 56 N/A mA
Modem-sleep① N/A 15 N/A mA
Light-sleep② N/A 0.9 N/A mA
Deep-sleep③ N/A 20 N/A mA
close N/A 0.5 N/A mA

① Modem-Sleep mode can be used for the case that CPU is always working, e.g., PWM or I2S etc. If WiFi is connected and no data is to transmitted, in this case, WiFi modem can be closed to save power energy. For example, if at DTIM3 status, keep asleep at 300ms, Then, the module can wake up to receive the Beacon package within 3ms and the current being 15mA.

② Light-Sleep mode can used for the case that CUP can stop the application temporally, e.g., Wi-Fi Switch . If Wi-Fi is connected and there is no data packet to transmitted, by the 802.11 standard (e.g., U-APSD), module can close Wi-Fi Modem and stop CPU to save power. For example, at DTIM3, keep up sleeping at 300ms, it would receive the Beacon package from AP after each 3ms, then the whole average current is about 0.9mA.

③ Deep-Sleep mode is applied to the case that Wi-Fi is not necessary to connect all the time, just send a data packet after a long time (e.g., transmit one temperate data each 100s) . it just need 0.3s-1s to connect AP after each 300s, and the whole average current is much smaller 1mA.

Wi-Fi RF Characteristics

The data in the following Table is gotten when voltage is 3.3V and1.1V in the indoor temperature environment.

At 72.2Mbps, output power consumption for PA Min Classical Max unite
Input frequencey 2412 N/A 2484 MHz
Input impedance N/A 50 N/A Ω
Input reflection N/A N/A -10 dB
At 72.2Mbps, output power consumption for PA 15.5 16.5 17.5 dBm
At 11b mode, output power consumption for PA 19.5 20.5 21.5 dBm
Sensibility N/A N/A N/A N/A
DSSS, 1Mbps N/A -98 N/A dBm
CCK11, Mbps N/A -91 N/A dBm
6Mbps(1/2 BPSK) N/A -93 N/A dBm
54Mbps(3/4 64-QAM) N/A -75 N/A dBm
HT20, MCS7(65 Mbps, 72.2 Mbps) N/A -72 N/A dBm
Adjacent Inhibition
OFDM, 6Mbps N/A 37 N/A dB
OFDM, 54Mbps N/A 21 N/A dB
HT20, MCS0 N/A 37 N/A dB
HT20, MCS7 N/A | 22 N/A dB
Minimum System
Minimum for ESP-M

Minimum for ESP-M

  1. the working voltage for module is DC 3.3V;
  2. the max current from IO of this module is 12mA;
  3. RST Pin is enabled when it is low level; and EN pin is enabled when it is high level;
  4. WiFi module is at update mode: GPIO0 is low level, then module reset to power; Wi-Fi module is at working mode: GPIO0 is at high level, and then reset to power;
  5. Wi-Fi module is connected to RXD of the other MCU, and TXD is connected to RXD of the other MCU.
Contact Us

ESP-M2

Difference from ESP-M1

ESP-M1 has the external antenna interface, but ESP-M2 is no, and the other parameters are the same as that of ESP-M1.

Therefore, please read the document about :ref: ‘ESP-M1’.

ESP-M3

ESP-M4

ESP32

ESP32IPX

ESP-1

入门篇

这一部分主要介绍数据科学的入门内容;包含数据科学的基础工具,如:Jupyter、Linux,以及 Python 基本的数据科学包 Numpy,画图包 Matplotlib;

ESP WiFi Modules

Date:2016-04-03

这一节记录下做数据处理时用到的一些 Linux 和 命令行技巧。大部分命令在 OS X 系统也适应。 esp8266 为什么使用 Linux 和命令行 ————————-

Have you ever noticed in the movies when the “super hacker,”— you know, the guy who can break into the ultra-secure military computer in under thirty seconds —sits down at the computer, he never touches a mouse? It’s because movie makers realize that we, as human beings, instinctively know the only way to really get anything done on a computer is by typing on a keyboard.

Most computer users today are only familiar with the graphical user interface (GUI) and have been taught by vendors and pundits that the command line interface (CLI) is a terrifying thing of the past. This is unfortunate, because a good command line interface is a marvelously expressive way of communicating with a computer in much the same way the written word is for human beings. It’s been said that “graphical user interfaces make easy tasks easy, while command line interfaces make difficult tasks possible” and this is still very true today.

—The Linux Command Line

Linux 安装和配置

如果没有任何编程及命令行基础,推荐使用 Ubuntu,有经验后可转 Debian,包多,稳定。进入系统,安装下 Anaconda[1],基本的 Python 环境 & 数据工具箱便有了。

再进入终端仿真器,0S X 自带 Terminal,但推荐用 iTerm 2,Debian 是 Konsole,配置下 终极Shell 环境 Oh My ZSH!

sh -c "$(curl -fsSL https://raw.github.com/robbyrussell/oh-my-zsh/master/tools/install.sh)"

如果提示你没有安装相关的依赖包,那么则需先安装相关依赖包,比如 Ubuntu 则需要想安装 Git、和 zsh 等,

sudo apt-get install git zsh

再装个查看数据、转换数据的小工具 csvkit,

sudo apt-get install csvkit

至此,便可开始数据科学之旅了。

基础命令

很多数据问题都可以用命令行解决,而且有些可以解决的非常高效,前提你熟悉一些基础的。 入门级命令,无非这些:

pwd #打印出当前工作目录名
cd #更改目录
ls #列出目录内容
ls #列出目录内容
file #确定文件类型
less #浏览文件内容
cp #复制文件和目录
mv #移动/重命名文件和目录
mkdir #创建目录
rm #删除文件和目录
ln #创建硬链接和符号链接

上面的 # 号代表注释,没有任何意义

入门级命令太简单了,就不介绍了,如果从来没碰过命令行,不熟悉使用可以查看帮助,如ls --help,另外我推荐些好书。

重定向和管道符

若你有仔细看过推荐书的任何一本,并做了些许练习,那么你的命令行已经入门了。在介绍命令行处理数据前,我想先谈谈重定向和管道符,若要给 Linux 的符号们举行「频繁使用」比赛,那他们必定是冠军。

平常你在终端里输入ls -l,结果应该类似这样:

➜  Dropbox ls -l
total 704
drwxr-xr-x@ 19 Scott  staff   646B Apr  3 10:27 A.HighlyEffectiveSelf
drwxr-xr-x@  5 Scott  staff   170B Apr  3 09:28 B.CreativePleasure
drwxr-xr-x@ 10 Scott  staff   340B Apr  3 09:10 C.TheArtOfWork
drwxr-xr-x@ 11 Scott  staff   374B Mar 10 12:40 D.HistoricalMemory
drwxr-xr-x@ 12 Scott  staff   408B Apr  3 10:09 E.DataBank
drwxr-xr-x@ 19 Scott  staff   646B Apr  3 09:14 F.BambooBasket
drwxr-xr-x@  5 Scott  staff   170B Mar  9 12:34 G.Other
-rw-r--r--@  1 Scott  staff     0B Apr  3 07:58 Icon?
-rw-rw-r--@  1 Scott  staff   312B Apr  3 16:18 README.md

文件是文件名排序的,若加个管道符 | 呢?

➜  Dropbox ls -l | sort
-rw-r--r--@  1 Scott  staff     0B Apr  3 07:58 Icon
-rw-rw-r--@  1 Scott  staff   312B Apr  3 16:18 README.md
drwxr-xr-x@  5 Scott  staff   170B Apr  3 09:28 B.CreativePleasure
drwxr-xr-x@  5 Scott  staff   170B Mar  9 12:34 G.Other
drwxr-xr-x@ 10 Scott  staff   340B Apr  3 09:10 C.TheArtOfWork
drwxr-xr-x@ 11 Scott  staff   374B Mar 10 12:40 D.HistoricalMemory
drwxr-xr-x@ 12 Scott  staff   408B Apr  3 10:09 E.DataBank
drwxr-xr-x@ 19 Scott  staff   646B Apr  3 09:14 F.BambooBasket
drwxr-xr-x@ 19 Scott  staff   646B Apr  3 10:27 A.HighlyEffectiveSelf
total 704

这里我加了 |sort 命令,你会发现,文件的排序已经变了,变成了文件大小的排序。这个命令和简单解释为ls -l输出了文件排序结果,而 sort 则接受了这个结果,并把它重新按文件的大小进行了排序,所以 | 就是管道的作用,可以从标准输入读取数据,然后再把数据输送到标准输出。这个特性非常有用,意味着你可以进行非常复杂的操作。

而什么是重定向‘>’呢?你用命令行操作的结果正常是直接显示在屏幕上的,那还有别的方式吗?你试试:

ls -l > ex01.txt

你发现,没有任何动静,但工作目录多了一个 ex01.txt 的文件,查看下这个文件试试,

➜  Dropbox cat ex01.txt
total 704
drwxr-xr-x@ 19 Scott  staff  646 Apr  3 10:27 A.HighlyEffectiveSelf
drwxr-xr-x@  5 Scott  staff  170 Apr  3 09:28 B.CreativePleasure
drwxr-xr-x@ 10 Scott  staff  340 Apr  3 09:10 C.TheArtOfWork
drwxr-xr-x@ 11 Scott  staff  374 Mar 10 12:40 D.HistoricalMemory
drwxr-xr-x@ 12 Scott  staff  408 Apr  3 10:09 E.DataBank
drwxr-xr-x@ 19 Scott  staff  646 Apr  3 09:14 F.BambooBasket
drwxr-xr-x@  5 Scott  staff  170 Mar  9 12:34 G.Other
-rw-r--r--@  1 Scott  staff    0 Apr  3 07:58 Icon
-rw-rw-r--@  1 Scott  staff  312 Apr  3 16:18 README.md
-rw-rw-r--   1 Scott  staff    0 Apr  3 16:57 ex01.txt

输出结果已经在这个文件里面了,这就是重定向的特性,允许我们来重定义标准输出送到哪里,在‘>’符号后面接个文件名即可。这点是非常实用的,比如你处理完数据后,肯定希望保存到一个文件里面。另外要注意一点,‘>’会格式化原有文件的内容,所以如果你是添加内容,请采用‘>>’。

处理数据常用命令
行过滤

若拿到一个很大的数据后,你肯定不想立马查看所有数据,一没必要,而打开慢,而是想做下行过滤,看看一小部分。常用的行过滤命令有head、tail、seq

看前10行数据:

➜  ~ head user_service_time.txt
bid service_time    weekday hour    lasttime
17283201    2016-1-27 8:30:00   3   8   3.0
17283201    2016-1-29 9:00:00   5   9   3.0
17283201    2016-2-22 17:00:00  1   17  3.0
17283201    2016-2-25 16:00:00  4   16  3.0
17283201    2016-2-29 16:30:00  1   16  3.0
17283201    2016-3-2 9:00:00    3   9   3.0
17283201    2014-9-19 9:00:07   5   9
17283201    2014-11-3 13:00:00  1   13
17283201    2014-11-22 15:00:00 6   15  3

查看前5行:

head -5 filename

前 n 行:

head -n filename

tail 则跟 head 刚好相反,查看的是尾行。若需要指定某些行则可用 sedawk,如指定 4-6行,可用sed -n '4, 6p' filename,我这里为了好看,用 nl 命令先把行号打印出来。

➜  ~ nl user_service_time.txt | sed -n '4, 6p'
     4  17283201    2016-2-22 17:00:00  1   17  3.0
     5  17283201    2016-2-25 16:00:00  4   16  3.0
     6  17283201    2016-2-29 16:30:00  1   16  3.0
# 查看奇数行
➜  ~ nl user_service_time.txt | head |  awk 'NR%2'
     1  bid service_time    weekday hour    lasttime
     3  17283201    2016-1-29 9:00:00   5   9   3.0
     5  17283201    2016-2-25 16:00:00  4   16  3.0
     7  17283201    2016-3-2 9:00:00    3   9   3.0
     9  17283201    2014-11-3 13:00:00  1   13
# 偶数行
➜  ~ nl user_service_time.txt | head |  awk '(NR+1)%2'
     2  17283201    2016-1-27 8:30:00   3   8   3.0
     4  17283201    2016-2-22 17:00:00  1   17  3.0
     6  17283201    2016-2-29 16:30:00  1   16  3.0
     8  17283201    2014-9-19 9:00:07   5   9
    10  17283201    2014-11-22 15:00:00 6   15  3
列提取

行提取很简单,那么列提取应该如何做呢?

# 把所有缩进符号改为逗号(英文), 再重定向成 csv 文件, .txt 文件可用 cat,excel 文件则需 in2csv
cat user_service_time.txt | tr '/t' ',' > user_service_time.csv

# 看看前 3 行,有哪些列
➜  ~ head -3 user_service_time.csv | csvlook
|-----------+--------------------+---------+------+-----------|
|  bid      | service_time       | weekday | hour | lasttime  |
|-----------+--------------------+---------+------+-----------|
|  17283201 |  2016-1-27 8:30:00 |  3      |  8   |  3.0      |
|  17283201 |  2016-1-29 9:00:00 |  5      |  9   |  3.0      |
|-----------+--------------------+---------+------+-----------|
# 得知总共有 5 列提取后 3 列 的前 10 行看看
➜  ~ < user_service_time.csv csvcut -c 3-5 | head | csvlook
|----------+------+-----------|
|  weekday | hour | lasttime  |
|----------+------+-----------|
|   3      |  8   |  3.0      |
|   5      |  9   |  3.0      |
|   1      |  17  |  3.0      |
|   4      |  16  |  3.0      |
|   1      |  16  |  3.0      |
|   3      |  9   |  3.0      |
|   5      |  9   |           |
|   1      |  13  |           |
|   6      |  15  |  3        |
|----------+------+-----------|
# 也可以用 -C 来忽略某些行,如忽略 3-5 列的前5行。
➜  ~ < user_service_time.csv csvcut -C 3-5 | head -5 | csvlook
|-----------+----------------------|
|  bid      | service_time         |
|-----------+----------------------|
|  17283201 |  2016-1-27 8:30:00   |
|  17283201 |  2016-1-29 9:00:00   |
|  17283201 |  2016-2-22 17:00:00  |
|  17283201 |  2016-2-25 16:00:00  |
|-----------+----------------------|
grep 查找
# 基本用法是 grep data filename
➜  ~ head user_service_time.txt | grep 29
17283201    2016-1-29 9:00:00   5   9   3.0
17283201    2016-2-29 16:30:00  1   16  3.0
wc 基本统计
  • 统计行数 wc -l file
  • 统计单词数 wc -w file
  • 统计字符数 wc -c file
➜  ~ wc -l user_service_time.txt
    1244 user_service_time.txt
➜  ~ < user_service_time.txt | grep 2016-2 | wc -l
      55
sort 排序
  • -n 按数字进行排序
  • -d 按字典序进行排序
  • -r 逆序排序
  • -k N 指定按第N列排序
# 以第 1 列数字反向排序
➜  ~ < user_service_time.csv | sort -nrk 1 | head -4 | csvlook
|-----------------+-------------------+---+---+----|
|  29101041557001 | 2016-3-6 8:30:00  | 7 | 8 | 4  |
|-----------------+-------------------+---+---+----|
|  29101041557001 | 2016-3-13 8:30:00 | 7 | 8 | 4  |
|  29101041557001 | 2016-2-28 8:30:00 | 7 | 8 | 4  |
|  29101041557001 | 2016-2-21 8:30:00 | 7 | 8 | 4  |
|-----------------+-------------------+---+---+----|
其他

iconv cut past uniq 等工具也是极好的,只不过用的略少,具体的数据分析则用 Pandas 更方便些。其他的,想到再添加。

Footnotes

[1]一个打包好 Python 科学计算常用包的平台工具,安装它,也就拥有了Python、NumPy、SciPy、Matplotlib、IPython、Jupyter 等。

Jupyter 的安装和使用

Date:2016-04-04
IPython 和 Jupyter

IPython 是一个 Python REPl shell,环境远比 Python 自带的强大,而 Jupyter Notebook 则是一个基于 IPython REPl 的 Web 应用,运行结果可保存为后缀.ipynb,交互性强,所见即所得,数据分析,写分析报告等的不二利器。

官方解释:

The Jupyter Notebook is a web application that allows you to create and share documents that contain live code, equations, visualizations and explanatory text. Uses include: data cleaning and transformation, numerical simulation, statistical modeling, machine learning and much more.
安装

如果有看过我上一篇文章 《用 Linux 处理数据》 ,并安装了 Anaconda,那么你已经有 Jupyter 了,打开终端,输入 jupyter notebook 即可。

远程使用

如果有远程使用的需求,则需把远程服务器配置下。

# 服务器下载 ssh 服务,如 Debian
sudo apt-get install openssh-server
# 不知道 ssh 是否打开,可以把他重启下
sudo service sshd restart

然后可以在局域网用其他电脑访问下,Mac 可直接在终端输入 ssh username@ip, 这里的 ip 指的局域网下的这台服务器的 ip,建议在路由器设置成静态 ip,若在外网使用,则需在路由器里面设置好端口转发。

然后参考官方教程 Running a notebook server 配置。

Attention

jupyter_notebook_config.py 这个文件里面 certfile 和 keyfile 的地址应为绝对地址.

一大串输入很麻烦,也易出错,建议打开 .zshrc,并在底部添加一行:

alias jn='jupyter notebook --certfile=/home/scott/.jupyter/mycert.pem --keyfile /home/scott/.jupyter/mykey.key'

这样到其他电脑键入:

jn

若能看到类似下方的输出,证明配置成功了。

[I 09:56:29.937 NotebookApp] The Jupyter Notebook is running at: https://[all ip addresses on your system]:8889/

我的端口配置的是 8889, 你也可以设置成其他的,再到路由器里配置下端口转发,大功告成。如果有用 R,也可用类似的方法配置下 RStudio Server,超简单。

快捷操作

Jupyter Notebook 的快捷键是一大亮点,如果有看过我这篇文章 《让 CapsLock 键更实用》[1] ,并对 Mac 或 Win 做了配置,那么你熟悉几个 Jupyter 的快捷,用 Jupyter 写报告之类基本上不需要鼠标了。

单元类型 (cell type)

Jupyter Notebook文档由一系列的单元 (cell) 组成,主要用的两类单元是:

  • markdown cell,命令模式下,按 m 可将单元切换为 markdown cell
  • code cell,命令模式下,按 y 可将单元切换为 code cell
常用快捷
  • 查看快捷键帮助: h
  • 保存: s
  • cell 间移动: j, k
  • 添加 cell: a, b
  • 删除 cell: dd
  • cell 编辑: x, c, v, z
  • 中断 kernel: ii
  • 重启 kernel: 00
  • 注释 code: Ctrl + /
拆分单元 (split cell)

编辑模式下按 control + shift + - 可拆分 c ell

查看对象信息
import numpy as np

tab 键查看提示信息

np.<tab>

查找 numpy 模块下,名称含有 cos 的对象

np.*cos*?

提供 numpy 模块的帮助信息

np?

提供 numpy 模块更详细的帮助信息

np??

查看 docstring

%pdoc np
魔术命令

最常用的是这个 %matplotlib inline,有点类似 ipython --pylab,画图用的;测试程序运行时间则只需把 %time 放在前面。

更多魔术命令可在 Jupyter Notebook 里键入:

%lsmagic
转换
# ipynb 文件转为 html
jupyter nbconvert --to html filename.ipynb

更多转换内容,请键入:

jupyter notebook --help

转换 rst、py、md 等格式都是非常方便的,但转 pdf,对中文的支持不好。必须先装 LaTex,LaTex 则需先把字体等调好。

Footnotes

[1]让键盘更高效的一篇文,昨天顺手把karabiner 配置的 gist 更新了。

数据绘图(Seaborn)

基础篇

这一部分介绍数据科学的基础内容;

概率和统计分析(StatsModels)

数值计算(Scipy)

线性模型(StatsModels)

机器学习初步(SKlearn)

工具篇

这篇是 Python 数据科学方面的书、好资源、好用工具等;