Preliminary - Structure

collection of used software

Introduction

Before any server could be set up, we have to design the structure of the entire system. There are two parts for this section, a software structure and a hardware structure.

Software Pipeline

Before the number of the exact hardware servers can be decided, the initial software pipeline needs to be fixed first.

Download

The very first thing is to download data. Once downloaded, data needs to be validated. It is not rare that some of the files could be damaged during transfer. For those invalid files, we need to try downloading them again.

Analysis

Once we have data downloaded, it needs to be assigned to one of the three analysis servers. Then, the result has to be collected and stored in the storage server.

Storage

Processed data needs to be re-compressed first and then stored in the tape library and storage servers. The result, on the other hand, should also be securely kept in storage.

Software Structure

Software structure

Hardware Assignment

The General Server (CA)

Before the data can be processed in the analysis server, it has to be downloaded and validated. Once the data is processed, the result and source need to be fetched from the analysis server and saved to the storage system. These are data-intensive operations which makes a VM not applicable. Therefore, I assigned a general-performance hardware server for these tasks.

Analysis Servers (C1-3)

Based on the previous estimations, we should have at least 3 analysis servers. For the best performance, the three bare-metal servers should be dedicated to analysis.

Compression (BA, BB)

As discussed before, there are two compression servers. Given the high demand for CPU, each server should be dedicated to the compressing tasks.

Storage

The storage server depends on other factors as well, and for now, those factors were not considered. Compression server A will be responsible for transferring files to the tape library while compression server B is responsible for transferring files to the HDD storage cluster.

Rack Assignment

Considering cooling and the electricity requirements, the servers were split into two racks: rack A for storage and rack B for computing.

Hardware Structure

hardware structure

Navigate through the Preliminary Section

Navigate through the Genetic Project