Software Requirements Specification for OMG Version 0.1

Introduction

This document is intended to specify a set of requirements for the project OMG, which means “Oh My Genes”, a web application for identifying genes’ expression differential.

Purpose

To identify differentially expressed genes given a gene expression file containing two cell samples. The main purpose is to help biologists analyze their experiment data.

Overview

The web application has a simple interface with a single button [Upload and GO]. Our scientists upload a plain text file containing gene expression levels from two samples, representing two experimental conditions. Accepting the file, the software will return a table of differentially expressed genes and a scatter plot of these genes whose X-axis is control and Y-axis is treatment. If an invalid gene expression is given, the web application returns a page informing the user to provide the correct format.

User Characteristics

  • Client User: Biologists who devote themselves studying the expression of genes.
  • Site Maintainer: Technicians who have the basic knowledge of flask and python to maintain a website.

Terminologies & Abbreviations Explaination

To introduce the project and specify the requirements in detail, this section is aimed to explain some terminologies and abbreviations, for the sake of better understanding of the content of the project.

Terminologies

  • Control sample - A cell sample prepared in its normal condition.
  • Treatment sample - A cell sample treated by special chemicals, or in which some genes are altered.
  • Differentially expressed genes - The genes which have significantly different expression levels between two samples.
  • Up-regulation - A gene is said to be up-regulated if it has higher expression in treatment than in control.

Abbreviations

  • OMG - Oh My Genes, which is the name of this project.
  • logFC - log fold change of gene expression. log_2 [T/C], where T is the gene expression level from a treatment sample, while C is the gene expression level from a control sample

Functional Requirements

Input

A valid submitted gene expression file has the following format:
  • It is a TAB-delimited, plain text file with three columns;
  • The file contains an optional head line, followed by each gene’sexpression in a control sample (e.g., ControlSample) and in a treatment sample(e.g.,KnockOutSample).

See the txt file pattern as following:

gene_id ControlSample KnockOutSample
AT1G01010 1.198558083 2.036161827
AT1G01020 13.75736234 13.370796
AT1G01030 0.833779536 0.203616183
AT1G01040 9.58846466 7.126566394
AT1G01046 0 0
AT1G01050 23.81482799 21.10821094
AT1G01060 0.625334652 1.221697096
AT1G01070 1.719670292 0.950208853
AT1G01080 28.34850421 25.24840665
AT1G01090 58.26034505 42.96301455
AT1G01100 1066.508249 1308.030358
AT1G01110 2.709783491 1.425313279

Output(Data Analyzing)

The web application displays a table and a scatter plot when given a gene expression file.

Table Analyzing

The table contains a list of differentially expressed genes with the following format:

gene_id control_sample knockout_sample log_2[FC]
AT1G01010 1.198558083 2.036161827 0.76

Plot Analyzing

The scatter plot displays differentially expressed genes. The X-axis is Control, and Y-axis is KnockOut. Replace ‘Control’ and ‘KnockOut’ with appropriated column names if provided in the uploaded file. The up-regulated genes are shown in red dots, and down-regulated genes are shown in blue.

_images/GED.png

Non-functional Requirements

For this section, we are going to clarify some non-functional requirments for a better using experience.

Response Time

Since this web application is used for biologists all around the world, the server maybe have to stand a heavy load of requests thus may cause some problems such as the system will down for several times or lose response to its clients. On the other hand, if users have a big amount of data to analyze, the time complexity must be taken into consideration. For the optimization aspects, our algorithm should be sufficiently effective to deal with this situation. After discussion, we have decided the limit of response time, it should less than 5 seconds for the most using cases.

Aesthetic Aspects

As a web application to solve biological analyzing. The interface should be designed as simply as possible and make the scatter plot and table more distinct.

Confidentiality Policy

As a public web application, the security of data must be a significant part to be considered. Especially for biologists, every experiment result is derived uneasily. Therefore, when users are using the application, their data must be used and stored properly. In other words, the web application must maintain the experiment data’s reliability, integration and secrecy and the experimental data will not be divulged without the agreement of the owner.

Constraints

For scientific and analyze uses, this application is orientated to various users(biologists) all over the world. Before developing the project, we have to take some main constraints into account so that we can come up any useful ideas to deal with them in the future.

Browser Compatibility

Since the application is designed for a wildly used application, it should have a cross platform compatibility to satisfy various users. Thus the application should be accessible through Firefox, Chrome, and Safari.

Space Complexity

Not only the time complexity should be considered, but also the space complexity. There are two aspects for space complexity. One is the size of the whole web application. The application should not larger than 1GB. For another aspect, when the application starts data analyzing, memory occupancy must be limited under a certain level to make sure the system work in a proper way.

Budget

Budget less than 10,000 USD(To Be Specified).

System Downtime

System downtime less than 30 minutes per year in order to satisfy users’ large demand of data analyzing.

Change Cases

  1. In the future, maybe we can support Excel files, not only just a txt file.
  2. Analyzed result downloading can be supported in the long run.
  3. More functions will be provided from the application.

(To Be Added…)

Milestones

  1. Submit SRS for review by May 1st.
  2. Get SRS approved by May 8th.
  3. Get design done by May 10th.
  4. Get coding done by May 24th.
  5. Acceptance tests by June 1st.
  6. Release by June 15th.

Appendices

Date Change Log
April 26th
  1. Specified basically functions and interface for the application.
  1. Get non-functional requirements in detail.
  1. Clarified some constrains about the application.
  1. Imagined some change cases and tried to find ways to slove them.
  1. Decided the project developing milestones.

References

Group Information

Group Name:

BuiGia

Group Member:

  1. 宋一豪(Bob 201632120126): 827030988@qq.com
  2. 徐宇泽(Universe 201632120128): universe_black@qq.com

Indices and tables