Welcome to WzDat’s documentation!

WzDat stands for Webzen Data Analysis Toolkit (pronounce it like “What’s that?”), which started as an attempt to build IPython & Pandas based data analysis system. WzDat augments the power of IPython & Pandas.

Currently WzDat consists of three applications. WzDat Python Module, Dashboard and WzDat Forwarder for Windows. This document is for WzDat Python Module & Dashboard.

User’s Guide

Foreword

If you are a Python enthusiast or data scientist, you might have been heard about the IPython and Pandas. Mixing them together forms one of the most beloved tools in this data centric era. IPython gives us nice interactive Python programming environment, and Pandas provides easy-to-use professional data tools. They are convinient and powerful tools not only for casual data work, but for big companies and laboratories. WzDat was started to empower this greate tool even more.

Profits

Let’s say you have thousands of log files, want to find some of them by their traits and specific words it contains, then feed the result to Pandas for more delicate analysis work. How would you do that? Some scripts or shell commands might be helpful, but the process is(in general) cumbersome and error prone to repeat everytime you get new files.

That’s where WzDat comes in handy. Once WzDat Python module & its compliant project are imported under the IPython notebook environment, It will pick out your target files with impressive speed even from hundreds of thounds of files.

When you finished your analysis work in the IPython notebook, you can forwarding your outcome to WzDat Dashboard, from where your collegues share your work instantly with concise web interface.

Docker Depedent

Docker is a great platform which solves complex deploy problems. Since WzDat is one of the building blocks to construct an IPython & Pandas based data analysis system, there are much more softwares, configs, admin works to build one. To make deployment simpler, I choose Docker as default platform. Though it may sound strange, WzDat is premised on Docker, and its code presumes certain environments(folders, files, packages) provided by its Docker container.

What’s new in WzDat 0.1?

This is initial public release.

Requirements

Linux System

WzDat has been developed and tested on dockerized Linux(Ubuntu). You can deploy WzDat server under OSX or Windows host by using tools like Boot2Docker, but unknown defects could arise.

Warning

Mounting directory from OSX or Windows to docker container would not work.

WzDat Solution & Project

Before using WzDat to analyze your data, you need your solution and project. Refer Solution And Projects and make your own solution & project.

Solution And Projects

Your WzDat solution is a place where your projects exist. Your WzDat projects are place for your domain specific file adapters and utilities reside.

Note

There are many log/data formats. They have different file naming rule, date time and log level format. To mitigate these differences, file adapter plays role. Refer to the following section for further details.

Creating The Skeleton

Let’s start by a barebone folders & files:

ws_mysol/
   ws_mysol/
      __init__.py
      myprj/
         __init__.py

In this case, your solution name is mysol (Preferably, solution nams comes from your personal name, or company name), then make topmost ws_mysol as solution folder, and inner ws_mysol folder, and as solution python package. In the solution package, there is a WzDat project python package myprj.

Tip

I advise you to start your solution with ws_ prefix(ws stands for WzDat Solution).

As need arises, you can append more projects to your solution.

File Adapter

File adapter is a python module in which you implement functions to feed required information about the file format you are dealing. For example, if you want your WzDat project can handle two types of file( log, dblog ), your project might look like this:

ws_mysol/
   ws_mysol/
      __init__.py
      myprj/
         __init__.py
         log.py     <--
         dbdump.py  <--
File Type File Extension Adapter
Log .log log.py
DB Dump .csv dbdump.py

Each adapter moudule is asked to implement following functions:

Configs

To feed setting information, config files are located at your solution and projects:

ws_mysol/
   ws_mysol/
      __init__.py     <--
      config.yaml
      myprj/
         __init__.py
         config.yaml  <--
         log.py
         dbdump.py

If some settings are common among your projects, you can place them at solution config file. If some others are specific for a certain project, create project config file.

Notes Folder

Finally, create __notes__ folder to accomodate your IPython Notebooks, and create nested folder per project:

ws_mysol/
   __notes__     <--
      myprj/     <--
   ws_mysol/
      __init__.py
      config.yaml
      myprj/
         __init__.py
         config.yaml
         log.py
         dbdump.py

When you create new IPython notebook, it will be directly put into ws_mysol/__notes__/myprj.

Ignition

Warning

WzDat requires compliant solution & project to use it as data analysis system. If this is your first time to setup WzDat, visit Solution And Projects page to get aquaint with them.

After you are done with making solution & project, clone wzdat-sys and build local docker image.

$ git clone https://github.com/haje01/wzdat-sys
$ cd wzdat-sys
$ sys/build.sh

Replace (..) variables with your own, then run script.

$ WZDAT_HOST=(server-host-name)\
> WZDAT_DATA_DIR=(data-folder)\
> WZDAT_SOL_DIR=(solution-folder)\
> WZDAT_SOL_PKG=(solution-package-name)\
> WZDAT_PRJ=(project-id)\
> WZDAT_IPYTHON_PORT=(ipython-port)\
> WZDAT_DASHBOARD_PORT=(dashboard-port)\
> sys/run.sh

Tutorial

Let’s have a quick glimpse of WzDat. In this tutorial, we’re going to build a simple solution & project to analyze Linux Syslog.

Build Docker Container

Make Solution & Project

Data Directory

Deploy By Docker

After you are done with the solution & project, clone wzdat-sys and build local docker image.

$ git clone https://github.com/haje01/wzdat-sys
$ cd wzdat-sys
$ sys/build.sh

Replace (..) variables with your own, then run script.

$ WZDAT_HOST=my.host.com\
$ WZDAT_DATA_DIR=/home/myhome/data\
$ WZDAT_SOL_DIR=/home/myhome/ws_mysol\
$ WZDAT_SOL_PKG=ws_mysol\
$ WZDAT_PRJ=myprj\
$ WZDAT_IPYTHON_PORT=8085\
$ WZDAT_DASHBOARD_PORT=8080\
$ sys/run.sh

Select, Find And Analyze

Expose Result To Dashboard

Additional Notes

WzDat Changelog

Version 0.1.0

Release on September 1, 2014.

First public release.

Indices and tables