Database & Storage System
Design the storage backends for cold and hot data. Manage database and optimization, disk array & tape library, and access protocols for different applications.
I have completed many personal projects, and you can find most of them in the sidebar. Some of them have been marked as featured, which I believe are the most interesting. However, these projects are only my explorations in the CS field and are not professional academic research projects. For academic-focused research projects, please refer to the Research Projects section.
Design the storage backends for cold and hot data. Manage database and optimization, disk array & tape library, and access protocols for different applications.
Distributed web architecture with security in mind. Manage access protocols, security, and architecture for dedicated server cluster of various projects.
Apply CS skills into other area like bioinformatics. Query, download, preprocess, and analyze required gene sequences from the SRA database. The results were used to support the genetic research project.
Application of ML models using PyTorch and TensorFlow. Complete pipeline for data preprocessing, optimization, and custom fits for the bio projects. Using NLP models for texts processing.
From painting the wall to deploying servers, I have designed and setup my own little datacenter to host all of my server hardware and storage equipment. This is the annual milestone of 2022. Every detail of the room, including electricity, ventilation & air conditioning, fiber network, access security, windows & walls, and server rack capacity, are all designed and mostly deployed by myself.
I have deeply participated in a genetic research project since 2020. I am responsible for filtering and downloading data, designing & deploying servers, configuring network & equipment, maintaining databases, and developing the full pipeline scripts to automate the entire process. I take care of all CS-related stuff, so the others could focus on their biology theories.
My little data center is in one location, yet the visitors are from around the world(including myself). This creates demands for high-quality networks. To ensure a smooth experience for most of the visitors, satellite nodes were deployed across the world as network proxies. Those nodes have an optimized network connection with the data center. All traffic is first sent to one of the access points and then pass to different backends.
Each application or even website sits in its own VM. Linux systems tend to be small, but the application data it hosts could be large and changing. A hybrid storage model was adopted where only the system disk is provisioned on the hypervisor's SSD array. All other storage is done through network drives mounted with one of the storage nodes, and the capacity can be adjusted dynamically.
Security is always a priority. TLS certificates are needed for most of the internal or management connections. However, it is neither practical nor affordable to purchase certificates from external CAs. A self-signed root CA was then adopted. Just like any well-known CAs, it signs multiple intermediate CAs to issue different certificates and was isolated in a secure location.
Starting in junior high school, I've been hosting Minecraft servers for more than 10 years. I created a Bungeecord group consisting of multiple servers, including proxy, lobby, survival, creative and sub-game servers. Each has different configurations and even game versions. Hosted all on my own server hardware with BGP optimized network. Open-sourced management software was configured to provide web control.