Background
Semi-structured data is a hybrid data type that possesses some level of structure in the form of tags, labels, etc. that help identify and categorize the data. Common semi-structured data formats include XML, JSON, CSV. There are two typical processing schemes for querying semi-structured data - preprocessing schemes and streaming schemes. Preprocessing schemes first parse the data into an in-memory structue before querying - therefore, require a large memory footprint. Streaming schemes locate and extract the data on the fly and thereby reduce the memory footprint - though they still need to parse the entire data stream character by character which limits the efficiency of querying.
Technology
Prof. Zhijia Zhao and his team have developed a novel querying framework for streaming JSON data called JSONSki - by considering the possibility and practicality of fast-forwarding the data stream to accelerate the query processing. JSONSki offers a set of application programming interfaces (APIs) that can be invoked during the streaming to dynamically fast-forward over different cases of irrelevant substructures - irrelevant to the query.
Images
The images above present a comparison of JSONSki performance with state-of-the-art techniques - (top) execution time of different solutions as a function of data stream size; (bottom) memory footprint comparison.
JSON data querying is valuable in a wide variety of industries including e-commerce, healthcare, manufacturing, media & entertainment and retail - to name a few.
Working prototype that demonstrates the effectiveness and efficiency of the invention - tested on real-world datasets. Working prototype evaluated against state-of-the-art solution.
Patent Pending
json, parser, semi-structured data, bit-parallel algorithm, data analytics, streaming data, jsonski