CATtestTrophy: Big Data --- is it big testing problem?

Lack of knowledge acts as bigger problem than lack of tools / skills to work with.
Probably this is the situation with relatively uncommon things in this world... Imagine new sport, newly discovered planet, new archeological evidence and even recent technology.

Probably Big Data and especially Big Data Testing is in similar zone at present. Like any product, to test Big Data based products we need -
different testing types (functional and non functional),
well formed test data management approach,
thoroughly planned test environment management.

Big Data processing involves three steps - gathering the data from various nodes, performing the (Map Reduce) operations to get the output and load the output on downstream systems for further processing.
As the technology deals with huge data, the functional testing needs to be carried out at every stage to detect the coding error and / or configuration (node) errors. This means that functional testing should involve minimum three stages:
- pre processing validation (on extracted data)
- Validation of processed data (before loading on downstream systems)
- Validation of extracted and loaded data.

Big Data technology is also associated with number of "V"s, some say Three, some say Four or even Five. From testing perspective, we will consider Volume, Velocity and Variety.

Volume:
Manual comparison is out of question considering the quantity. It might be carried out only on exceptional instances, that too in my opinion with sampling technique.
File comparison scripts / tools can be incorporated in parallel on multiple nodes.

Velocity:
Performance testing provides vital inputs on speed of operation and throughput of certain processes.

Variety:
Unstructured data (text based), social media data, log files, etc are some formats that add to variety of data handled by Big Data.
To compare structured data - the scripts need to be prepared that will produce the output in desired format and then the actual output can be compared with the desired output.
Verifying unstructured data - is the largely manual testing activity. Automation may not pay for this due to variety of formats handled. The best bet here is the analysis of unstructured data and building the best possible test scenarios to get the maximum coverage.

EDT of NFR testing:
Setting up the test environment, building the dummy data in volume and utilizing proper tools are key aspects o non functional testing and these are no longer different in Big Data Testing.

Situational Tests:
The situation that induced adoption of Big Data should also be reflected while building test scenarios.
e.g. For an investment bank government regulation may induce need of Big Data based structures.

Big Data has profound impact on global economy; Big Data testing does in turn demand a good mix of innovation and common sense, tools and test cases. Testing community should evolve and live up to this, we have done it in the past and we will keep doing it in future.

CATtestTrophy

Monday, September 16, 2013

Big Data --- is it big testing problem?

No comments:

Post a Comment