Draft: Generating Random Text Data for testing V-Sekai systems

Metadata

Status: Draft
Deciders: V-Sekai, Bioblaze Payne, fire
Tags: V-Sekai, GPT-4, 20230527-server-performance-test

Context and Problem Statement

V-Sekai is a project that aims to provide an immersive virtual world experience for users. As the project grows in complexity, it becomes increasingly important to ensure that the system is thoroughly tested and can handle various scenarios. One of the challenges faced by the V-Sekai team is the lack of diverse and realistic data for testing purposes. This limitation hinders the ability to identify potential issues and optimize the system’s performance.

Proposed Solution

To address the limitations in testing the V-Sekai system due to the lack of diverse and realistic data, this proposal suggests generating random text data for testing purposes.

Implementation

We will use a tool like Chance.js library to generate random text data for testing purposes and use the database.

Text Processing

For generating random sentences, we will use something like the following method:

chance.sentence();

Reference: Chance.js Sentence

User Data

For generating random user names, we will use something like the following method:

chance.name();

Reference: Chance.js Name

Database Operations

We will perform the following SQL operations on the user and chat tables:

SELECT All Data in Table
SELECT LIKE Data based on %NA%
SELECT Specific Row Data
INSERT
UPDATE
DELETE

Tables

Table user

Table chat

SQL Function

We will create a SQL function to check if the username/password match and return ‘OK’. We will also check if the query returned is ‘OK’ and add false information randomly in the nodes to signify a user doing invalid calls for production testing.

The implementation will generate 90 days of data, generating 100,000 text data entries for testing purposes.

Positive Consequences

Improved testing capabilities with diverse and realistic data
Enhanced system reliability due to thorough testing
Identification of potential issues before they impact users

Negative Consequences

Additional development time required to implement the solution
Potential increase in storage requirements for generated data

Option Graveyard

Using static test data: This option was considered for simplicity, but it would not provide the diverse and realistic data needed for thorough testing.
Manual data generation: This option involved manually creating test data, but it was deemed time-consuming and inefficient compared to using a library like Chance.js.
Third-party data generation services: While there are external services that can generate random text data, using them would introduce dependencies and potential costs. Implementing our a solution similar to Chance.js allows for better control over the process.
Crowdsourcing test data: This option involved gathering test data from users or volunteers. However, it raised concerns about data privacy, consistency, and quality.

If this enhancement will be used infrequently, can it be worked around with a few lines of script?

No, this enhancement is essential for thorough testing and cannot be easily worked around with a few lines of script.

Is there a reason why this should be core and done by us?

Yes, addressing the V-Sekai limitation is crucial for ensuring the system’s reliability and performance. Implementing this solution internally allows for better control over the testing process and data generation.