Summary of Research in the CREDIT Center

    1. Big Data Storage and Computing Platform

      Example
      Seismic Data Analytics Cloud Platform

      In this project, we delivered the Seismic Data Analytics Cloud Platform (SAC), and build a typical fault detection application on it. Comparing with traditional seismic data analytics platform, SAC is building on Apache Spark, which can handle huge size seismic data with good performance and scalability. All data sets are saved in HDFS. SAC provide SDK to make it easy for writing parallel applications on cluster. The big seismic data will be distributed into memory of the whole cluster through Spark RDD, and the transformations can be applied on the RDD in parallel. High level APIs, such as getLine in inline, crossline and time/depth directions, getSubVolume, transpose, and applyMap functions are provided in SDK of SAC, which make it easy to write seismic data analytics applications without considering the details of parallel programming. Most of seismic attribute computations can easily be implemented within SAC. The fault detection was selected as an application to verify the performance and usability of SAC. The slope attribute computation, format transformation (from binary data to libsvm and tensor flow format) are conducted on SAC. TensorFlow was selected as deep learning package. After training with CNN in TensorFlow and saving the model, a new seismic dataset can be fed into this model to predict the fault. The predict process is running in parallel on SAC. The speedup of performance on SAC is obvious comparing with the sequential codes. Big data visualization is another challenge issue. We build a dataserver on SAC to cache all data in memory, and then the data can be retrieved efficiently from the render server. At the client side, all user’s actions in browser will sent to render server through web server. This architecture can handle big data rendering with decent performance, and the 3D visualization results can be shown in the browser on thin client.



      2. Big Data Analytics using Deep Learning

        Example
        Optimizing deep NN structures systematically and automatically.

        In this research, we present a deep evolution neural network (DENN), which is a new deep learning framework based on agent learning and immune theories. The learning in the framework is realized by communications between neuron agents, which is implemented by state transition of neuron agents. Meanwhile, the structure of the DENN is optimized automatically within the learning procedure.






        3. Real-time Decision Support using Streaming Analytics

          Example
          Multisensor Change Detection based on Big Time-Series Data and Dempster-Shafer Theory

          With the proliferation of the Internet of Things, numerous sensors are deployed to monitor a phenomenon that in many cases can be modeled by an underlying stochastic process. The goal is to detect change in the process with tolerable false alarm rate. In practice, sensors may have different accuracy and sensitivity range, or they decay along time. As a result, the sensed data will contain uncertainties and sometimes they are conflicting. In this study, we propose a novel framework to take advantage of Dempster-Shafer Theory's capability of representation of uncertainty to detect change and effectively deal with complementary hypotheses. Specifically, Kullback-Leibler divergence is used as the metric to find the distances between the estimated distribution with the before and after change distributions. Mass functions are calculated based on those distance values for each sensor independently and Dempster-Shafer combination rule is applied to combine the mass values among all sensors. In the case of high conflict in various sensor readings, Dezert-Smarandache combination rule is applied and the belief, plausibility and pignistic probability are obtained for decision making. Simulation results using both synthetic data and real data demonstrate the effectiveness of the proposed schemes.



          4. Big Data Visualization

      Tasks:

      • Spark platform for various ML applications such as seismic data analytics;
      • Data Flow for parallel processing of big data
    CREDIT LOGO













      Tasks:

      • Optimizing deep NN structures systematically and automatically;
      • Designing deep learning model for highly unbalanced data sets (sparse data analytics);
      • Designing Transfer learning based deep learning models
    CREDIT LOGO
      Tasks:
      • Big time-series data analytics;
      • Evidence theory based combination for decision support
    CREDIT LOGO








      Tasks:
      • 3D big data visualization for real-time web applications (based on Big Data Storage and Computing Platform