添加AI服务，支持麦克风、摄像头输入

2026-03-18 12:54:00 +08:00 · 2026-03-18 12:54:00 +08:00 · d09d6f1cc0
commit d09d6f1cc0
759 changed files with 240072 additions and 0 deletions
--- a/.gitignore
+++ b/.gitignore
@ -0,0 +1,31 @@
+*.iml
+.gradle
+/local.properties
+/.idea/
+.DS_Store
+/build/
+*/build/
+*/*/build/
+*/debug/
+*/release/
+/captures
+.externalNativeBuild
+.cxx
+*.bat
+*.apk
+output-metadata.json
+# app用到zip的请忽略
+# 自定义了local.properties的请删除这条
+local.properties
+~$*
+gradlew
+.idea
+# 自定义了根目录gradle/gradle-wrapper.properties的请删除这条（不推荐自定义）
+#gradle
+build
+
+# 预编译的静态库和第三方二进制文件（opencv-mobile 等）
+# 这些文件体积大、为二进制，应从发布源获取而非纳入版本控制
+*.a
+*.so
+duix-sdk/src/main/cpp/third/opencv-mobile-*/
--- a/README.md
+++ b/README.md
@ -0,0 +1,405 @@
+# Duix Mobile for Android SDK Documentation
+
+English | [中文](./README_zh.md)
+
+## 1. Product Overview
+
+`Duix Mobile for Android` is a lightweight, fully offline 2D digital human solution for Android, supporting real-time rendering of digital avatars driven by voice audio.
+
+### 1.1 Application Scenarios
+
+- **Low deployment cost**: Suitable for unattended scenarios such as large-screen terminals, government halls, and banks.
+- **Minimal network dependency**: Runs entirely locally, no internet required, stable operation in subways and remote areas.
+- **Diverse functionality**: Can serve as a guide, Q&A customer service, intelligent companion, and more.
+
+### 1.2 Core Features
+
+- Customizable digital avatar and local rendering
+- Real-time voice-driven playback (supports WAV playback and PCM streaming)
+- Motion playback control (specific or random actions)
+- Automatic resource download management
+
+---
+
+## 2. Terminology
+
+| Term               | Meaning                                                                    |
+|--------------------|----------------------------------------------------------------------------|
+| PCM                | Pulse-Code Modulation, raw audio stream with 16kHz sample rate, 16-bit depth, Mono channel |
+| WAV                | An audio file format that supports PCM encoding, suitable for short voice playback |
+| RenderSink         | Rendering data reception interface, implemented by the SDK, can be used for custom rendering or default display |
+| DUIX               | Main control object of the digital human, integrates model loading, rendering, broadcasting, and motion control |
+| GLES               | OpenGL ES, a graphics interface for rendering images on Android               |
+| SpecialAction      | A JSON file attached to the model that marks action intervals (e.g., greetings, waving) |
+
+---
+
+## 3. SDK Access
+
+### 3.1 Module Reference (Recommended)
+
+1. Obtain the complete source package, unzip it, and copy the `duix-sdk` directory to the project root directory.
+2. In the project `settings.gradle`, add:
+
+```gradle
+include ':duix-sdk'
+```
+
+3. In the module's `build.gradle`, add the dependency:
+
+```gradle
+dependencies {
+    api project(":duix-sdk")
+}
+```
+
+### 3.2 AAR Reference (Optional)
+
+1. Place the compiled `duix-sdk-release.aar` module into the `libs/` directory.
+2. Add the dependency:
+
+```gradle
+dependencies {
+    api fileTree(include: ['*.jar', '*.aar'], dir: 'libs')
+}
+```
+
+---
+
+## 4. Integration Requirements
+
+| Item           | Description                                                     |
+|----------------|-----------------------------------------------------------------|
+| System         | Supports Android 10+ systems.                                    |
+| CPU Architecture | armeabi-v7a, arm64-v8a                                           |
+| Hardware Requirements | Device CPU with 8 or more cores (Snapdragon 8 Gen 2), 8GB or more memory, available storage space of 1GB or more |
+| Network        | None (Fully local operation)                                    |
+| Development IDE | Android Studio Giraffe 2022.3.1 Patch 2                         |
+| Memory Requirements | Minimum 800MB memory available for the digital human          |
+
+---
+
+## 5. Usage Flow Overview
+
+```mermaid
+graph TD
+A[Check Configuration and Models] --> B[Build DUIX Instance]
+B --> C[Call init to Initialize]
+C --> D[Display Avatar / Render]
+D --> E[PCM or WAV Audio Driving]
+E --> F[Playback Control & Motion Triggering]
+F --> G[Resource Release]
+```
+
+---
+
+## 6. Key Interfaces and Example Calls
+
+### 6.1 Model Check and Download
+
+Before using the rendering service, ensure that the basic configuration and model files are synchronized to local storage. The SDK provides a simple demonstration of the model download and decompression process using `VirtualModelUtil`. If model download is slow or fails, developers can choose to cache the model package to their own storage service.
+
+> Function Definition: `ai.guiji.duix.sdk.client.VirtualModelUtil`
+
+```
+// Check if base configuration is downloaded
+boolean checkBaseConfig(Context context)
+
+// Check if the model is downloaded
+boolean checkModel(Context context, String name)
+
+// Base configuration download
+void baseConfigDownload(Context context, String url, ModelDownloadCallback callback)
+
+// Model download
+void modelDownload(Context context, String modelUrl, ModelDownloadCallback callback)
+```
+
+`ModelDownloadCallback` includes progress, completion, failure callbacks, etc., as defined in the SDK.
+
+```
+interface ModelDownloadCallback {
+    // Download progress
+    void onDownloadProgress(String url, long current, long total);
+    // Unzip progress
+    void onUnzipProgress(String url, long current, long total);
+    // Download and unzip complete
+    void onDownloadComplete(String url, File dir);
+    // Download and unzip failed
+    void onDownloadFail(String url, int code, String msg);
+}
+```
+
+**Call Example**:
+
+```kotlin
+if (!VirtualModelUtil.checkBaseConfig(mContext)){
+    VirtualModelUtil.baseConfigDownload(mContext, baseConfigUrl, callback)
+}
+```
+
+```kotlin
+if (!VirtualModelUtil.checkModel(mContext, modelUrl)){
+    VirtualModelUtil.modelDownload(mContext, modelUrl, callback)
+}
+```
+
+---
+
+### 6.2 Initialization and Rendering Start
+
+In the `onCreate()` stage of the rendering page, build the DUIX object and call the init interface.
+
+> Function Definition: `ai.guiji.duix.sdk.client.DUIX`
+
+```
+// Build DUIX object
+public DUIX(Context context, String modelName, RenderSink sink, Callback callback)
+
+// Initialize DUIX service
+void init()
+```
+
+**DUIX Object Construction Explanation**:
+
+| Parameter     | Type      | Description                                                    |
+|---------------|-----------|----------------------------------------------------------------|
+| context       | Context   | System context                                                  |
+| modelName     | String    | Can pass the model download URL (if downloaded) or cached filename |
+| render        | RenderSink| Rendering data interface, SDK provides a default rendering component inheriting from this interface, or you can implement it yourself |
+| callback      | Callback  | Various callback events handled by the SDK                      |
+
+Where **Callback** is defined as: `ai.guiji.duix.sdk.client.Callback`
+
+```
+interface Callback {
+    void onEvent(String event, String msg, Object info);
+}
+```
+
+**Call Example**:
+
+```kotlin
+duix = DUIX(mContext, modelUrl, mDUIXRender) { event, msg, info ->
+    when (event) {
+        ai.guiji.duix.sdk.client.Constant.CALLBACK_EVENT_INIT_READY -> {
+            initOK()
+        }
+
+        ai.guiji.duix.sdk.client.Constant.CALLBACK_EVENT_INIT_ERROR -> {
+            initError()
+        }
+        // ...
+    }
+}
+// Asynchronous callback result
+duix?.init()
+```
+
+In the `init` callback, confirm the initialization result.
+
+---
+
+### 6.3 Digital Human Avatar Display
+
+Use the SDK-provided `DUIXRenderer` and `DUIXTextureView` to quickly implement rendering with transparency support. Alternatively, you can implement the `RenderSink` interface to customize the rendering logic.
+
+The **RenderSink** definition is as follows: `ai.guiji.duix.sdk.client.render.RenderSink`
+
+```java
+/**
+ * Rendering pipeline, returns rendering data through this interface
+ */
+public interface RenderSink {
+
+    // The frame's buffer data is arranged in BGR order
+    void onVideoFrame(ImageFrame imageFrame);
+
+}
+```
+
+**Call Example**:
+
+Use `DUIXRenderer` and `DUIXTextureView` to quickly implement rendering. These controls support transparency and can freely set the background and foreground.
+
+```kotlin
+override fun onCreate(savedInstanceState: Bundle?) {
+    super.onCreate(savedInstanceState)
+    // ...
+    mDUIXRender =
+        DUIXRenderer(
+            mContext,
+            binding.glTextureView
+        )
+
+    binding.glTextureView.setEGLContextClientVersion(GL_CONTEXT_VERSION)
+    binding.glTextureView.setEGLConfigChooser(8, 8, 8, 8, 16, 0) // Transparency
+    binding.glTextureView.isOpaque = false           // Transparency
+    binding.glTextureView.setRenderer(mDUIXRender)
+    binding.glTextureView.renderMode =
+        GLSurfaceView.RENDERMODE_WHEN_DIRTY      // Must be called after setting the renderer
+
+    duix = DUIX(mContext, modelUrl, mDUIXRender) { event, msg, _ ->
+    }
+    // ...
+}
+```
+
+---
+
+### 6.4 Broadcasting Control
+
+#### Use Streaming PCM to Drive Digital Human Broadcasting
+
+**PCM Format: 16kHz sample rate, single channel, 16-bit depth**
+
+> Function Definition: `ai.guiji.duix.sdk.client.DUIX`
+
+```
+// Notify service to start pushing audio
+void startPush()
+
+// Push PCM data
+void pushPcm(byte[] buffer)
+
+// Finish a segment of audio push (Call this after the audio push is complete, not after playback finishes)
+void stopPush()
+```
+
+`startPush`, `pushPcm`, and `stopPush` need to be called in pairs. `pushPcm` should not be too long. After pushing the entire audio, call `stopPush` to end the session. Use `startPush` again for the next audio.
+
+**The audio data between each startPush and stopPush segment should be at least 1 second (32000 bytes), otherwise the mouth shape driver cannot be triggered, and blank frames can be used to fill in.**
+
+**Call Example**:
+
+```kotlin
+val thread = Thread {
+            duix?.startPush()
+            val inputStream = assets.open("pcm/2.pcm")
+            val buffer = ByteArray(320)
+            var length = 0
+            while (inputStream.read(buffer).also { length = it } > 0){
+                val data = buffer.copyOfRange(0, length)
+                duix?.pushPcm(data)
+            }
+            duix?.stopPush()
+            inputStream.close()
+}
+thread.start()
+```
+
+---
+
+### 6.5 Motion Control
+
+#### Play Specific Motion Interval
+
+The model supports new motion intervals marked in `SpecialAction.json`
+
+> Function Definition: `ai.guiji.duix.sdk.client.DUIX`
+
+```
+/**
+ * Play specific motion interval
+ * @param name The motion interval name, which can be obtained from @{ModelInfo.getSilenceRegion()} after init callback
+ * @param now Whether to play immediately: true: play now; false: wait for current silent or motion interval to finish
+ */
+void startMotion(String name, boolean now)
+```
+
+**Call Example**:
+
+```kotlin
+duix?.startMotion("Greeting", true)
+```
+
+#### Randomly Play Motion Interval
+
+> Function Definition: `ai.guiji.duix.sdk.client.DUIX`
+
+```
+/**
+ * Randomly play a motion interval
+ * @param now Whether to play immediately: true: play now; false: wait for current silent or motion interval to finish
+ */
+void startRandomMotion(boolean now);
+```
+
+**Call Example**:
+
+```kotlin
+duix?.startRandomMotion(true)
+```
+
+---
+
+## 7. Proguard Configuration
+
+If using obfuscation, add the following in `proguard-rules.pro`:
+
+```proguard
+-keep class ai.guiji.duix.DuixNcnn{*; }
+```
+
+---
+
+## 8. Precautions
+
+1. Ensure that the base configuration file and model are downloaded to the specified location before driving rendering initialization.
+2. PCM audio should not be too long, as PCM buffers are cached in memory; long audio streams may cause memory overflow.
+3. To replace the preview model, modify the `modelUrl` value in `MainActivity.kt` and use the SDK's built-in file download and decompression management to obtain the complete model files.
+4. Audio driving format: 16kHz sample rate, single channel, 16-bit depth.
+5. Insufficient device performance may result in the audio feature extraction speed not matching the playback speed. You can use `duix?.setReporter()` to monitor frame rendering information.
+
+---
+
+## 9. FAQ and Troubleshooting Guide
+
+| Issue                          | Possible Cause               | Solution                     |
+|---------------------------------|------------------------------|------------------------------|
+| init callback failed            | Model path error or model not downloaded | Use `checkModel` to check model status |
+| Rendering black screen          | EGL configuration or texture view error | Use SDK-provided example settings |
+| No PCM playback effect          | Incorrect format or `startPush` not called | Ensure audio format is correct and call push method |
+| Model download slow             | Unstable network or restricted CDN | Support self-hosted model file storage service |
+
+---
+
+## 10. Version History
+
+**<a>4.0.1</a>**
+
+1. Supports PCM audio stream driving the digital human, improving audio playback response speed.
+2. Optimized motion interval playback, allowing specific motion intervals based on model configuration.
+3. Custom audio player, removed Exoplayer playback dependency.
+4. Provided simplified model download synchronization management tools.
+5. The audio data between each startPush and stopPush segment should be at least 1 second (32000 bytes), otherwise the mouth shape driver cannot be triggered, and blank frames can be used to fill in.
+
+**<a>3.0.5</a>**
+
+```text
+1. Updated arm32 CPU libonnxruntime.so version to fix compatibility issues.
+2. Modified motion interval playback function, supports random and sequential playback, requires manual call to stop playback to return to silent interval.
+```
+
+**<a>3.0.4</a>**
+
+```text
+1. Fixed model display issue due to low float precision on some devices.
+```
+
+**<a>3.0.3</a>**
+
+```text
+1. Optimized local rendering.
+```
+
+## 11. 🔗 Open-source Dependencies
+
+| Module                                   | Description                    |
+|------------------------------------------|--------------------------------|
+| [onnx](https://github.com/onnx/onnx)     | General AI model standard format |
+| [ncnn](https://github.com/Tencent/ncnn)  | High-performance neural network computing framework (Tencent) |
+
+---
+
+For more help, please contact the technical support team.
--- a/README_zh.md
+++ b/README_zh.md
@ -0,0 +1,485 @@
+# Duix Mobile for Android SDK 文档
+
+中文 | [English](./README.md)
+
+## 一、产品介绍
+
+`Duix Mobile for Android` 是一套轻量级、纯离线的 Android 平台 2D 虚拟人解决方案，支持通过语音音频驱动数字人形象并进行实时渲染。
+
+### 1.1 应用场景
+
+- **部署成本低**：适用于大屏终端、政务展厅、银行等无人值守场景。
+- **网络依赖小**：完全本地运行，无需联网，可在地铁、偏远地区稳定运行。
+- **功能多样化**：可服务于导览讲解、问答客服、智能陪伴等多种业务形态。
+
+### 1.2 核心功能
+
+- 数字人形象定制与本地渲染
+- 实时语音驱动播报（支持 WAV 播放和 PCM 推送）
+- 动作播放控制（指定动作、随机动作）
+- 资源自动下载管理
+
+---
+
+## 二、术语说明
+
+| 术语                | 含义                                                                     |
+|-------------------|------------------------------------------------------------------------|
+| PCM               | Pulse-Code Modulation，16kHz 采样率、16bit 位深、Mono 单通道的原始音频流                |
+| WAV               | 一种音频文件格式，支持 PCM 编码，适合短语音播放                                             |
+| RenderSink        | 渲染数据接收接口，由 SDK 提供实现，可用于自定义渲染或默认展示                                      |
+| DUIX              | 数字人主控对象，集成了模型加载、渲染、播报、动作等能力                                            |
+| GLES              | OpenGL ES，Android 渲染图像用到的图形接口                                          |
+| SpecialAction     | 模型附带的 JSON 文件，标注动作区间（例如打招呼、挥手等）                                        |
+
+---
+
+## 三、SDK 获取方式
+
+### 3.1 Module 引用（推荐）
+
+1. 获取完整源码包，解压后将 `duix-sdk` 目录复制到项目根目录下。
+2. 在项目 `settings.gradle` 中添加：
+
+```gradle
+include ':duix-sdk'
+```
+
+3. 在模块 `build.gradle` 中添加依赖：
+
+```gradle
+dependencies {
+    api project(":duix-sdk")
+}
+```
+
+### 3.2 AAR 引用（可选）
+
+1. 将duix-sdk模块编译的 `duix-sdk-release.aar` 放入 `libs/` 目录。
+2. 添加依赖：
+
+```gradle
+dependencies {
+    api fileTree(include: ['*.jar', '*.aar'], dir: 'libs')
+}
+```
+
+---
+
+## 四、集成要求
+
+| 项目     | 描述                                                 |
+|--------|----------------------------------------------------|
+| 系统     | 支持 Android 10+ 系统。                                 |
+| CPU架构  | armeabi-v7a, arm64-v8a                             |
+| 硬件要求   | 要求设备 CPU8 核及以上(骁龙8 Gen2),内存 8G 及以上。可用存储空间 1GB 及以上。 |
+| 网络     | 无（完全本地运行）                                          |
+| 开发 IDE | Android Studio Giraffe 2022.3.1 Patch 2            |
+| 内存要求   | 可用于数字人的内存 >= 800MB                                 |
+
+
+**编译项目的Gradle使用的JDK版本为17,需要在File->Setting->Build,Execution,Deployment->Grade Projects->Gradle JDK: ${选择一个17版本的JDK}**
+
+---
+
+## 五、使用流程概览
+
+```mermaid
+graph TD
+A[检查配置与模型] --> B[构建 DUIX 实例]
+B --> C[调用 init 初始化]
+C --> D[展示形象 / 渲染]
+D --> E[PCM 或 WAV 音频驱动]
+E --> F[播放控制与动作触发]
+F --> G[资源释放]
+```
+
+---
+
+## 六、关键接口与调用示例
+
+### 6.1. 模型检查及下载
+
+使用渲染服务前需要将基础配置及模型文件同步到本地存储中,SDK中提供了VirtualModelUtil简单演示了模型下载解压流程。
+若模型下载过慢或无法下载，开发者可以选择将模型包缓存到自己的存储服务。
+
+> 函数定义: `ai.guiji.duix.sdk.client.VirtualModelUtil`
+
+```
+// 检查基础配置是否已下载
+boolean checkBaseConfig(Context context)
+
+// 检查模型是否已下载
+boolean checkModel(Context context, String name)
+
+// 基础配置下载
+void baseConfigDownload(Context context, String url, ModelDownloadCallback callback)
+
+// 模型下载
+void modelDownload(Context context, String modelUrl, ModelDownloadCallback callback)
+```
+
+`ModelDownloadCallback` 包含进度、完成、失败等回调，详见 SDK 定义。
+
+```
+interface ModelDownloadCallback {
+    // 下载进度
+    void onDownloadProgress(String url, long current, long total);
+    // 解压进度
+    void onUnzipProgress(String url, long current, long total);
+    // 下载解压完成
+    void onDownloadComplete(String url, File dir);
+    // 下载解压失败
+    void onDownloadFail(String url, int code, String msg);
+}
+```
+
+**调用示例**:
+
+```kotlin
+if (!VirtualModelUtil.checkBaseConfig(mContext)){
+    VirtualModelUtil.baseConfigDownload(mContext, baseConfigUrl, callback)
+}
+```
+
+```kotlin
+if (!VirtualModelUtil.checkModel(mContext, modelUrl)){
+    VirtualModelUtil.modelDownload(mContext, modelUrl, callback)
+}
+
+```
+
+---
+
+### 6.2. 初始化与渲染启动
+
+在渲染页onCreate()阶段构建DUIX对象并调用init接口
+
+> 函数定义: `ai.guiji.duix.sdk.client.DUIX`
+
+```
+// 构建DUIX对象
+public DUIX(Context context, String modelName, RenderSink sink, Callback callback)
+
+// 初始化DUIX服务
+void init()
+```
+
+**DUIX对象构建说明**:
+
+| 参数         | 类型         | 描述                                  |
+|------------|------------|-------------------------------------|
+| context    | Context    | 系统上下文                               |
+| modelName  | String     | 可以传递模型下载的URL(已下载完成)或缓存的文件名          |
+| render     | RenderSink | 渲染数据接口，sdk提供了默认的渲染组件继承自该接口，也可以自己实现  |
+| callback   | Callback   | SDK处理的各种回调事件                        |
+
+
+其中**Callback**的定义: `ai.guiji.duix.sdk.client.Callback`
+
+```
+interface Callback {
+    void onEvent(String event, String msg, Object info);
+}
+```
+
+**调用示例**:
+
+```kotlin
+duix = DUIX(mContext, modelUrl, mDUIXRender) { event, msg, info ->
+    when (event) {
+        ai.guiji.duix.sdk.client.Constant.CALLBACK_EVENT_INIT_READY -> {
+            initOK()
+        }
+
+        ai.guiji.duix.sdk.client.Constant.CALLBACK_EVENT_INIT_ERROR -> {
+            initError()
+        }
+        // ...
+
+    }
+}
+// 异步回调结果
+duix?.init()
+```
+
+在init回调中确认初始化结果
+
+---
+
+### 6.3. 数字人形象展示
+
+使用 SDK 提供的 `DUIXRenderer` 和 `DUIXTextureView` 可快速实现支持透明通道的渲染。也可以自己实现RenderSink接口自定义渲染逻辑。
+
+其中**RenderSink**的定义如下: `ai.guiji.duix.sdk.client.render.RenderSink`
+
+```java
+/**
+ * 渲染管道，通过该接口返回渲染数据
+ */
+public interface RenderSink {
+
+    // frame中的buffer数据以bgr顺序排列
+    void onVideoFrame(ImageFrame imageFrame);
+
+}
+```
+
+**调用示例**:
+
+使用DUIXRenderer及DUIXTextureView控件简单实现渲染展示,该控件支持透明通道可以自由设置背景及前景
+
+```kotlin
+override fun onCreate(savedInstanceState: Bundle?) {
+    super.onCreate(savedInstanceState)
+    // ...
+    mDUIXRender =
+        DUIXRenderer(
+            mContext,
+            binding.glTextureView
+        )
+
+    binding.glTextureView.setEGLContextClientVersion(GL_CONTEXT_VERSION)
+    binding.glTextureView.setEGLConfigChooser(8, 8, 8, 8, 16, 0) // 透明
+    binding.glTextureView.isOpaque = false           // 透明
+    binding.glTextureView.setRenderer(mDUIXRender)
+    binding.glTextureView.renderMode =
+        GLSurfaceView.RENDERMODE_WHEN_DIRTY      // 一定要在设置完Render之后再调用
+
+    duix = DUIX(mContext, modelUrl, mDUIXRender) { event, msg, _ ->
+    }
+    // ...
+}
+```
+
+---
+
+### 6.4 播报控制
+
+#### 使用流式推送PCM驱动数字人播报
+
+**PCM格式:16k采样率单通道16位深**
+
+> 函数定义: `ai.guiji.duix.sdk.client.DUIX`
+
+```
+// 通知服务开始推送音频
+void startPush()
+
+// 推送PCM数据
+void pushPcm(byte[] buffer)
+
+// 完成一段音频推送(音频推送完就调要该函数，而不是等播放完成再调用。)
+void stopPush()
+
+```
+
+startPush、pushPcm、stopPush需要成对调用，pushPcm不宜过长。可以在一整段音频推送完后调用stopPush结束当前会话，下一段音频再使用startPush重新开启推送。
+
+**每段startPush到stopPush中间的音频数据最少要1秒(32000字节)否则无法触发口型驱动，可以自行使用空白帧填充。**
+
+**调用示例**:
+
+```kotlin
+val thread = Thread {
+            duix?.startPush()
+            val inputStream = assets.open("pcm/2.pcm")
+            val buffer = ByteArray(320)
+            var length = 0
+            while (inputStream.read(buffer).also { length = it } > 0){
+                val data = buffer.copyOfRange(0, length)
+                duix?.pushPcm(data)
+            }
+            duix?.stopPush()
+            inputStream.close()
+}
+thread.start()
+```
+
+---
+
+#### WAV 播放驱动
+
+> 函数定义: `ai.guiji.duix.sdk.client.DUIX`
+
+```
+void playAudio(String wavPath) 
+```
+
+**该函数兼容旧的wav驱动数字人接口，在内部实际是调用了PCM推流方式实现驱动。**
+
+
+**参数说明**:
+
+| 参数      | 类型     | 描述                    |
+|---------|--------|-----------------------|
+| wavPath | String | 16k采样率单通道16位深的wav本地文件 |
+
+
+**调用示例**:
+
+```kotlin
+duix?.playAudio(wavPath)
+```
+
+音频播放状态及进度回调:
+
+```kotlin
+object : Callback {
+    fun onEvent(event: String, msg: String, info: Object) {
+        when (event) {
+            // ...
+
+            "play.start" -> {
+                // 开始播放音频
+            }
+
+            "play.end" -> {
+                // 完成播放音频
+            }
+            "play.error" -> {
+                // 音频播放异常
+            }
+        }
+    }
+}
+```
+
+---
+
+#### 终止当前播报
+
+当数字人正在播报时调用该接口终止播报。
+
+> 函数定义: `ai.guiji.duix.sdk.client.DUIX`
+
+```
+boolean stopAudio();
+```
+
+**调用示例如下**：
+
+```kotlin
+duix?.stopAudio()
+```
+
+---
+
+### 6.5. 动作控制
+
+
+#### 播放指定动作区间
+
+模型中支持新的动作区间标注(SpecialAction.json)
+
+> 函数定义: `ai.guiji.duix.sdk.client.DUIX`
+
+```
+/**
+ * 播放指定动作区间
+ * @param name 动作区间名称，在init成功回调时，可以在@{ModelInfo.getSilenceRegion()}中获取到可用的动作区间
+ * @param now 是否立即播放 true: 立即播放; false: 等待当前静默区间或动作区间播放完毕后播放
+ */
+void startMotion(String name, boolean now)
+```
+
+**调用示例如下**：
+
+```kotlin
+duix?.startMotion("打招呼", true)
+```
+
+#### 随机播放动作区间
+
+随机播放场景及旧的标注协议(config.json)
+
+> 函数定义: `ai.guiji.duix.sdk.client.DUIX`
+
+```
+/**
+ * 随机播放一个动作区间
+ * @param now 是否立即播放 true: 立即播放; false: 等待当前静默区间或动作区间播放完毕后播放
+ */
+void startRandomMotion(boolean now);
+```
+
+**调用示例如下**：
+
+```kotlin
+duix?.startRandomMotion(true)
+```
+
+---
+
+## 七. Proguard配置
+
+如果代码使用了混淆，请在proguard-rules.pro中配置：
+
+```proguard
+-keep class ai.guiji.duix.DuixNcnn{*; }
+```
+
+---
+
+## 八、注意事项
+
+1. 驱动渲染初始化前需要确保基础配置文件及模型下载到指定位置。
+2. 播放的PCM音频不宜过长，播放的PCM缓存在内存中，过长的音频流可能导致内存溢出。
+3. 替换预览模型可以在MainActivity.kt文件中修改modelUrl的值，使用SDK中自带的文件下载解压管理以获得完整的模型文件。
+4. 音频驱动的格式: 16k采样率单通道16位深度
+5. 设备性能不足时可能导致音频特征提取的速度跟不上音频播放的速度，可以使用duix?.setReporter()函数添加一个监控观察帧渲染返回的信息。
+6. 每段startPush到stopPush中间的音频数据最少要1秒(32000字节)否则无法触发口型驱动，可以自行使用空白帧填充。
+
+---
+
+## 九、常见问题与排查指南
+
+| 问题现象                | 可能原因                     | 解决方案                   |
+|---------------------|--------------------------|------------------------|
+| init 回调失败           | 模型路径错误或未下载完成             | 使用 `checkModel` 检查模型状态 |
+| 渲染黑屏                | EGL 配置或纹理视图设置错误          | 使用 SDK 提供示例中的设置方法      |
+| PCM 无播报效果           | 格式不符或未调用 startPush       | 确保音频格式正确并调用推送方法        |
+| 模型下载过慢              | 网络不稳定或 CDN 受限            | 支持自建模型文件托管服务           |
+
+---
+
+## 十、版本记录
+
+**<a>4.0.1</a>**
+
+```text
+1. 支持PCM音频流驱动数字人，提升音频播放响应速度。
+2. 优化动作区间播放，可根据模型配置指定播放动作区间。
+3. 自定义音频播放器，去除Exoplayer播放依赖
+4. 提供简洁的模型下载同步管理工具
+```
+
+**<a>3.0.5</a>**
+
+```text
+1. 更新arm32位cpu的libonnxruntime.so版本以修复兼容问题。
+2. 修改动作区间播放函数，可以使用随机播放和顺序播放，需要主动调用停止播放动作区间以回到静默区间。
+```
+
+**<a>3.0.4</a>**
+
+```text
+1. 修复部分设备gl默认float低精度导致无法正常显示形象问题。
+```
+
+**<a>3.0.3</a>**
+
+```text
+1. 优化本地渲染。
+```
+
+## 十一、🔗 开源依赖
+
+| 模块                                        | 描述                |
+|-------------------------------------------|-------------------|
+| [onnx](https://github.com/onnx/onnx)      | 通用AI模型标准格式        |
+| [ncnn](https://github.com/Tencent/ncnn)   | 高性能神经网络计算框架（腾讯）   |
+
+---
+
+如需更多帮助，请联系技术支持团队。
--- a/android_glide_lint.xml
+++ b/android_glide_lint.xml
@ -0,0 +1,7 @@
+<?xml version="1.0" encoding="utf-8" ?>
+<!-- https://github.com/bumptech/glide/issues/4940 -->
+<lint>
+    <issue id="NotificationPermission">
+        <ignore regexp="com.bumptech.glide.request.target.NotificationTarget" />
+    </issue>
+</lint>
--- a/build.gradle
+++ b/build.gradle
@ -0,0 +1,39 @@
+// Top-level build file where you can add configuration options common to all sub-projects/modules.
+buildscript {
+    repositories {
+
+        maven { url 'https://maven.aliyun.com/repository/public/' }
+        maven { url 'https://maven.aliyun.com/repository/central' }
+        maven { url 'https://maven.aliyun.com/repository/google' }
+        maven { url 'https://maven.aliyun.com/repository/gradle-plugin' }
+        maven { url 'https://jitpack.io' }
+        maven { url 'https://repo1.maven.org/maven2/' }
+        google()
+    }
+    dependencies {
+        classpath 'com.android.tools.build:gradle:8.1.2'
+        classpath 'org.jetbrains.kotlin:kotlin-gradle-plugin:1.8.10'
+    }
+}
+
+allprojects {
+    repositories {
+
+        maven { url 'https://maven.aliyun.com/repository/public/' }
+        maven { url 'https://maven.aliyun.com/repository/central' }
+        maven { url 'https://maven.aliyun.com/repository/google' }
+        maven { url 'https://maven.aliyun.com/repository/gradle-plugin' }
+        maven { url 'https://jitpack.io' }
+        maven { url 'https://repo1.maven.org/maven2/' }
+        google()
+    }
+}
+
+ext {
+    compileSdkVersion = 33
+    buildToolsVersion = '30.0.2'
+    minSdkVersion = 24
+    targetSdkVersion = 33
+    versionCode = 2
+    versionName = "0.0.2"
+}
--- a/demo.jks
+++ b/demo.jks
--- a/duix-sdk/.gitignore
+++ b/duix-sdk/.gitignore
@ -0,0 +1 @@
+/build
--- a/duix-sdk/build.gradle
+++ b/duix-sdk/build.gradle
@ -0,0 +1,68 @@
+plugins {
+    id 'com.android.library'
+}
+
+android {
+    namespace 'ai.guiji.duix.sdk.client'
+    compileSdk 33
+
+    defaultConfig {
+        minSdk 24
+        versionCode 13
+        versionName '4.1.1'
+
+        externalNativeBuild {
+            cmake {
+                abiFilters 'arm64-v8a', "armeabi-v7a"
+                cppFlags "-std=c++17", "-fexceptions"
+                //arguments "-DANDROID_STL=c++_shared","-DANDROID_TOOLCHAIN=clang"
+            }
+        }
+    }
+
+    buildTypes {
+        debug {
+            minifyEnabled false
+            proguardFiles getDefaultProguardFile('proguard-android-optimize.txt'), 'proguard-rules.pro'
+
+            buildConfigField("String", "VERSION_NAME", "\"${defaultConfig.versionName}\"")
+            buildConfigField('int', 'VERSION_CODE', "${defaultConfig.versionCode}")
+        }
+
+        release {
+            minifyEnabled false
+            proguardFiles getDefaultProguardFile('proguard-android-optimize.txt'), 'proguard-rules.pro'
+
+            buildConfigField("String", "VERSION_NAME", "\"${defaultConfig.versionName}\"")
+            buildConfigField('int', 'VERSION_CODE', "${defaultConfig.versionCode}")
+
+            android.libraryVariants.all { variant ->
+                variant.outputs.all {
+                    outputFileName = "duix_client_sdk_${buildType.name}_${defaultConfig.versionName}.aar"
+                }
+            }
+        }
+    }
+
+    externalNativeBuild {
+        cmake {
+            path "src/main/cpp/CMakeLists.txt"
+            version "3.18.1"
+        }
+    }
+
+    compileOptions {
+        sourceCompatibility JavaVersion.VERSION_1_8
+        targetCompatibility JavaVersion.VERSION_1_8
+    }
+//    kotlinOptions {
+//        jvmTarget = '1.8'
+//    }
+//    packagingOptions {
+//        exclude 'lib/**/libonnxruntime.so'
+//    }
+}
+
+dependencies {
+    api fileTree(include: ['*.jar', '*.aar'], dir: 'libs')
+}
--- a/duix-sdk/consumer-rules.pro
+++ b/duix-sdk/consumer-rules.pro
--- a/duix-sdk/libs/resource_loader.jar
+++ b/duix-sdk/libs/resource_loader.jar
--- a/duix-sdk/proguard-rules.pro
+++ b/duix-sdk/proguard-rules.pro
@ -0,0 +1,90 @@
+# Add project specific ProGuard rules here.
+# You can control the set of applied configuration files using the
+# proguardFiles setting in build.gradle.
+#
+# For more details, see
+#   http://developer.android.com/guide/developing/tools/proguard.html
+
+# If your project uses WebView with JS, uncomment the following
+# and specify the fully qualified class name to the JavaScript interface
+# class:
+#-keepclassmembers class fqcn.of.javascript.interface.for.webview {
+#   public *;
+#}
+
+# Uncomment this to preserve the line number information for
+# debugging stack traces.
+#-keepattributes SourceFile,LineNumberTable
+
+# If you keep the line number information, uncomment this to
+# hide the original source file name.
+#-renamesourcefileattribute SourceFile
+
+-optimizationpasses 5  #指定代码的压缩级别 0 - 7，一般都是5，无需改变
+-dontusemixedcaseclassnames #不使用大小写混合
+#告诉Proguard 不要跳过对非公开类的处理，默认是跳过
+-dontskipnonpubliclibraryclasses #如果应用程序引入的有jar包，并且混淆jar包里面的class
+-verbose #混淆时记录日志（混淆后生产映射文件 map 类名 -> 转化后类名的映射
+#指定混淆时的算法，后面的参数是一个过滤器
+#这个过滤器是谷歌推荐的算法，一般也不会改变
+-optimizations !code/simplification/arithmetic,!field/*,!class/merging/*
+#类型转换错误 添加如下代码以便过滤泛型（不写可能会出现类型转换错误，一般情况把这个加上就是了）,即避免泛型被混淆
+-keepattributes Signature
+#假如项目中有用到注解，应加入这行配置,对JSON实体映射也很重要,eg:fastjson
+-keepattributes *Annotation*
+#抛出异常时保留代码行数
+-keepattributes SourceFile,LineNumberTable
+#保持 native 的方法不去混淆
+-keepclasseswithmembernames class * {
+    native <methods>;
+}
+
+#保持指定规则的方法不被混淆（Android layout 布局文件中为控件配置的onClick方法不能混淆）
+-keepclassmembers class * extends android.app.Activity {
+    public void *(android.view.View);
+}
+#保持自定义控件指定规则的方法不被混淆
+-keep public class * extends android.view.View {
+    public <init>(android.content.Context);
+    public <init>(android.content.Context, android.util.AttributeSet);
+    public <init>(android.content.Context, android.util.AttributeSet, int);
+    public void set*(...);
+}
+#保持枚举 enum 不被混淆
+-keepclassmembers enum * {
+    public static **[] values();
+    public static ** valueOf(java.lang.String);
+}
+#保持 Parcelable 不被混淆（aidl文件不能去混淆）
+-keep class * implements android.os.Parcelable {
+    public static final android.os.Parcelable$Creator *;
+}
+#需要序列化和反序列化的类不能被混淆（注：Java反射用到的类也不能被混淆）
+-keepnames class * implements java.io.Serializable
+#保护实现接口Serializable的类中，指定规则的类成员不被混淆
+-keepclassmembers class * implements java.io.Serializable {
+    static final long serialVersionUID;
+    private static final java.io.ObjectStreamField[] serialPersistentFields;
+    !static !transient <fields>;
+    private void writeObject(java.io.ObjectOutputStream);
+    private void readObject(java.io.ObjectInputStream);
+    java.lang.Object writeReplace();
+    java.lang.Object readResolve();
+}
+#保持R文件不被混淆，否则，你的反射是获取不到资源id的
+-keep class **.R$* { *; }
+
+-keepclassmembers class * {
+   public <init> (org.json.JSONObject);
+}
+
+-keepclassmembers enum * {
+    public static **[] values();
+    public static ** valueOf(java.lang.String);
+}
+
+
+#以下针对App本身设置
+
+
+-keep class ai.guiji.duix.DuixNcnn{*; }
--- a/duix-sdk/src/main/AndroidManifest.xml
+++ b/duix-sdk/src/main/AndroidManifest.xml
@ -0,0 +1,4 @@
+<?xml version="1.0" encoding="utf-8"?>
+<manifest xmlns:android="http://schemas.android.com/apk/res/android">
+
+</manifest>
--- a/duix-sdk/src/main/cpp/CMakeLists.txt
+++ b/duix-sdk/src/main/cpp/CMakeLists.txt
@ -0,0 +1,199 @@
+cmake_minimum_required(VERSION 3.13.2)
+project(gjmywrt)
+
+#set(CMAKE_CXX_COMPILER g++)
+#set(CMAKE_C_COMPILER gcc)
+set(CMAKE_CXX_STANDARD 17)
+set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++17 -fPIC  -funwind-tables -fno-omit-frame-pointer")
+#set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -fPIC ")
+set(CMAKE_EXPORT_COMPILE_COMMANDS ON)
+set(CMAKE_BUILD_TYPE "Debug")
+set(ORT_NO_EXCEPTIONS FALSE)
+
+#set(DEVAUD false)
+option(DEVARM "shared library support" TRUE)
+
+if(DEVARM)
+  set(OpenCV_DIR ${CMAKE_SOURCE_DIR}/third/opencv-mobile-4.6.0-android/sdk/native/jni)
+  find_package(OpenCV REQUIRED core imgproc highgui)
+
+  set(ncnn_DIR ${CMAKE_SOURCE_DIR}/third/ncnn-20231027-android-shared/${ANDROID_ABI}/lib/cmake/ncnn)
+  find_package(ncnn REQUIRED)
+
+  add_library(turbojpeg STATIC IMPORTED)
+  set_target_properties(turbojpeg
+    PROPERTIES IMPORTED_LOCATION
+    ${CMAKE_SOURCE_DIR}/third/arm/${ANDROID_ABI}/libturbojpeg.a)
+
+  add_library(libjpeg STATIC IMPORTED)
+  set_target_properties(libjpeg
+    PROPERTIES IMPORTED_LOCATION
+    ${CMAKE_SOURCE_DIR}/third/arm/${ANDROID_ABI}/libjpeg.a)
+
+  add_library(onnx-lib SHARED IMPORTED)
+  set_target_properties(
+    onnx-lib
+    PROPERTIES IMPORTED_LOCATION
+    ${CMAKE_SOURCE_DIR}/third/arm/${ANDROID_ABI}/libonnxruntime.so)
+endif()
+
+option(USE_OPENCV "shared library support" TRUE)
+option(USE_NCNN "shared library support" TRUE)
+option(USE_OPENVINO "shared library support" FALSE)
+set(THIRD_INC "third/include")
+
+if(DEVARM)
+  set(THIRD_LIB "third/libarm")
+else()
+  set(THIRD_LIB "third/lib64")
+endif()
+
+
+
+if(DEVARM)
+
+  include_directories(
+    include
+    dhcore
+    dhmfcc
+    aes
+    android
+    third/arm/include
+    third/arm/include/onnx
+    third/arm/include/ncnn
+    third/arm/include/turbojpeg
+  )
+else()
+
+  include_directories(
+    include
+    dhcore
+    dhmfcc
+    aes
+    third2/include
+    third2/inc2404
+    third2/include/onnx
+    third2/include/turbojpeg
+    third2/include/ncnn
+    /usr/local/include/opencv4
+  )
+
+  link_directories(
+    ${CMAKE_SOURCE_DIR}/lib64
+    ${CMAKE_SOURCE_DIR}/third2/lib64
+    ${CMAKE_SOURCE_DIR}/third2/lib2404
+    /usr/local/lib
+  )
+endif()
+
+add_library(dhcore STATIC
+  dhcore/dh_mem.c
+  dhcore/dh_data.cpp
+  dhcore/dh_que.cpp
+)
+
+target_link_libraries(dhcore
+  -lm -lz -pthread
+)
+
+
+
+
+
+add_library(dhmfcc STATIC
+  dhmfcc/dhpcm.cpp
+  dhmfcc/dhwenet.cpp
+  dhmfcc/wenetai.cpp
+  dhmfcc/AudioFFT.cpp
+  dhmfcc/iir_filter.cpp
+  dhmfcc/mfcc.cpp
+)
+
+target_link_libraries(dhmfcc
+  dhcore
+  -lz -lm 
+)
+
+target_compile_options(dhmfcc   PRIVATE
+  -std=c++17
+)
+
+include_directories(
+  include
+  dhunet
+)
+
+add_library(dhunet STATIC
+  dhunet/jmat.cpp
+  dhunet/blendgram.cpp
+  dhunet/face_utils.cpp
+  dhunet/malpha.cpp
+  dhunet/munet.cpp
+)
+
+target_link_libraries(dhunet
+  dhcore
+  dhmfcc
+  -lz -lm 
+)
+
+if(DEVARM)
+
+  add_library(gjduix SHARED
+    duix/gjduix.cpp
+    duix/gjsimp.cpp
+    android/Log.cpp
+    android/DuixJni.cpp
+    android/JniHelper.cpp
+    aes/aes_cbc.c  aes/aes_core.c  aes/aes_ecb.c  aes/base64.c  aes/cbc128.c  aes/gj_aes.c
+    aes/aesmain.c
+  )
+
+  target_link_libraries(gjduix
+    dhcore
+    dhmfcc
+    dhunet
+    ${OpenCV_LIBS}
+    ${log-lib}
+    ncnn
+    onnx-lib
+    libjpeg
+    turbojpeg
+    -lz -lm 
+    -landroid
+  )
+
+else()
+  add_library(gjduix SHARED
+    duix/gjduix.cpp
+    duix/gjsimp.cpp
+  )
+
+  target_link_libraries(gjduix
+    dhcore
+    dhmfcc
+    dhunet
+    -ljpeg
+    -lopencv_core
+    -lopencv_imgproc
+    -lopencv_highgui
+    -lturbojpeg
+    -lonnxruntime
+    -lncnn
+    -lz -lm 
+  )
+
+
+endif()
+
+
+add_executable(duixtest
+  #iostest/testduix.cpp
+  iostest/testsimp.cpp
+)
+
+target_link_libraries(duixtest
+  dhcore
+  gjduix
+)
+
--- a/duix-sdk/src/main/cpp/aes/aes.h
+++ b/duix-sdk/src/main/cpp/aes/aes.h
@ -0,0 +1,41 @@
+
+
+#ifndef HEADER_AES_H
+# define HEADER_AES_H
+
+# include <stddef.h>
+
+# define AES_ENCRYPT     1
+# define AES_DECRYPT     0
+
+# define AES_MAXNR 14
+# define AES_BLOCK_SIZE 16
+
+struct aes_key_st {
+# ifdef AES_LONG
+    unsigned long rd_key[4 * (AES_MAXNR + 1)];
+# else
+    unsigned int rd_key[4 * (AES_MAXNR + 1)];
+# endif
+    int rounds;
+};
+
+typedef struct aes_key_st AES_KEY;
+
+
+int AES_set_encrypt_key(const unsigned char *userKey, const int bits, AES_KEY *key);
+int AES_set_decrypt_key(const unsigned char *userKey, const int bits, AES_KEY *key);
+
+void AES_encrypt(const unsigned char *in, unsigned char *out, const AES_KEY *key);
+void AES_decrypt(const unsigned char *in, unsigned char *out, const AES_KEY *key);
+
+void AES_ecb_encrypt(const unsigned char *in, unsigned char *out, const AES_KEY *key, 
+					const int enc);
+
+void AES_cbc_encrypt(const unsigned char *in, unsigned char *out,
+                     size_t length, const AES_KEY *key,
+                     unsigned char *ivec, const int enc);
+
+#endif
+
+
--- a/duix-sdk/src/main/cpp/aes/aes_cbc.c
+++ b/duix-sdk/src/main/cpp/aes/aes_cbc.c
@ -0,0 +1,23 @@
+/*
+ * Copyright 2002-2016 The OpenSSL Project Authors. All Rights Reserved.
+ *
+ * Licensed under the OpenSSL license (the "License").  You may not use
+ * this file except in compliance with the License.  You can obtain a copy
+ * in the file LICENSE in the source distribution or at
+ * https://www.openssl.org/source/license.html
+ */
+
+#include "aes.h"
+#include "modes.h"
+
+void AES_cbc_encrypt(const unsigned char *in, unsigned char *out,
+                     size_t len, const AES_KEY *key,
+                     unsigned char *ivec, const int enc)
+{
+
+    if (enc)
+        CRYPTO_cbc128_encrypt(in, out, len, key, ivec,
+                              (block128_f) AES_encrypt);
+    else
+        CRYPTO_cbc128_decrypt(in, out, len, key, ivec, (block128_f) AES_decrypt);
+}
--- a/duix-sdk/src/main/cpp/aes/aes_core.c
+++ b/duix-sdk/src/main/cpp/aes/aes_core.c
--- a/duix-sdk/src/main/cpp/aes/aes_ecb.c
+++ b/duix-sdk/src/main/cpp/aes/aes_ecb.c
@ -0,0 +1,24 @@
+/*
+ * Copyright 2002-2016 The OpenSSL Project Authors. All Rights Reserved.
+ *
+ * Licensed under the OpenSSL license (the "License").  You may not use
+ * this file except in compliance with the License.  You can obtain a copy
+ * in the file LICENSE in the source distribution or at
+ * https://www.openssl.org/source/license.html
+ */
+
+#include <assert.h>
+
+#include "aes.h"
+#include "aes_locl.h"
+
+void AES_ecb_encrypt(const unsigned char *in, unsigned char *out, const AES_KEY *key, const int enc)
+{
+    assert(in && out && key);
+    assert((AES_ENCRYPT == enc) || (AES_DECRYPT == enc));
+
+    if (AES_ENCRYPT == enc)
+        AES_encrypt(in, out, key);
+    else
+        AES_decrypt(in, out, key);
+}
--- a/duix-sdk/src/main/cpp/aes/aes_locl.h
+++ b/duix-sdk/src/main/cpp/aes/aes_locl.h
@ -0,0 +1,42 @@
+/*
+ * Copyright 2002-2016 The OpenSSL Project Authors. All Rights Reserved.
+ *
+ * Licensed under the OpenSSL license (the "License").  You may not use
+ * this file except in compliance with the License.  You can obtain a copy
+ * in the file LICENSE in the source distribution or at
+ * https://www.openssl.org/source/license.html
+ */
+
+#ifndef HEADER_AES_LOCL_H
+# define HEADER_AES_LOCL_H
+
+//# include <e_os2.h>
+# include <stdio.h>
+# include <stdlib.h>
+# include <string.h>
+
+# if defined(_MSC_VER) && (defined(_M_IX86) || defined(_M_AMD64) || defined(_M_X64))
+#  define SWAP(x) (_lrotl(x, 8) & 0x00ff00ff | _lrotr(x, 8) & 0xff00ff00)
+#  define GETU32(p) SWAP(*((u32 *)(p)))
+#  define PUTU32(ct, st) { *((u32 *)(ct)) = SWAP((st)); }
+# else
+#  define GETU32(pt) (((u32)(pt)[0] << 24) ^ ((u32)(pt)[1] << 16) ^ ((u32)(pt)[2] <<  8) ^ ((u32)(pt)[3]))
+#  define PUTU32(ct, st) { (ct)[0] = (u8)((st) >> 24); (ct)[1] = (u8)((st) >> 16); (ct)[2] = (u8)((st) >>  8); (ct)[3] = (u8)(st); }
+# endif
+
+# ifdef AES_LONG
+typedef unsigned long u32;
+# else
+typedef unsigned int u32;
+# endif
+typedef unsigned short u16;
+typedef unsigned char u8;
+
+# define MAXKC   (256/32)
+# define MAXKB   (256/8)
+# define MAXNR   14
+
+/* This controls loop-unrolling in aes_core.c */
+# undef FULL_UNROLL
+
+#endif                          /* !HEADER_AES_LOCL_H */
--- a/duix-sdk/src/main/cpp/aes/aesmain.c
+++ b/duix-sdk/src/main/cpp/aes/aesmain.c
@ -0,0 +1,111 @@
+#include <stdio.h>
+#include <stdlib.h>
+#include <stdint.h>
+#include <string.h>
+#include "gj_aes.h"
+#include "aesmain.h"
+
+int mainenc(int enc,char* infn,char* outfn){
+    char result[255] ;
+    memset(result,0,255);
+    char* key = "yymrjzbwyrbjszrk";
+    char* aiv = "yymrjzbwyrbjszrk";
+    int base64 = 1;
+    int outlen = 0;
+    int encrst = 0;
+    char* fn1 = infn;
+    char* fn2 = outfn;
+    FILE* fr = fopen(fn1,"rb");
+    FILE* fw = fopen(fn2,"wb");
+    while(1){
+        if(!fr){
+            encrst = -1001;
+            break;
+        }
+        if(!fw){
+            encrst = -1002;
+            break;
+        }
+        gj_aesc_t* aesc = NULL;
+        init_aesc(key,aiv,enc,&aesc);
+        uint64_t size = 0;
+        uint64_t realsize = 0;
+        if(enc){
+            fwrite("gjdigits",1,8,fw);
+            fwrite(&size,1,8,fw);
+            fwrite(&size,1,8,fw);
+            fwrite(&size,1,8,fw);
+
+            while(!feof(fr)){
+                char data[16];
+                memset(data,0,16);
+                uint64_t rst = fread(data,1,16,fr);
+                if(rst){
+                    size +=rst;
+                    do_aesc(aesc,data,16,result,&outlen);
+                    fwrite(result,1,outlen,fw);
+                }
+            }
+            fseek(fw,8,0);
+            fwrite(&size,1,8,fw);
+
+        }else{
+            uint64_t rst = fread(result,1,32,fr);
+            if(!rst){
+                encrst = -1003;
+                break;
+            }
+            if((result[0]!='g')||(result[1]!='j')){
+                encrst = -1004;
+                break;
+            }
+            uint64_t *psize = (uint64_t*)(result+8);
+            realsize = *psize;
+            if(realsize>1034*1024*1024){
+                encrst = -1005;
+                break;
+            }
+            while(!feof(fr)){
+                char data[16];
+                memset(data,0,16);
+                uint64_t rst = fread(data,1,16,fr);
+                if(rst){
+                    size +=rst;
+                    do_aesc(aesc,data,16,result,&outlen);
+                    if(size>realsize){
+                        outlen -= (size-realsize);
+                        //printf("===%lu > %lu rst %lu %d outlen \n",size,realsize,rst,outlen);
+                    }
+                    fwrite(result,1,outlen,fw);
+                }
+            }
+        }
+        break;
+    }
+    if(fr) fclose(fr);
+    if(fw) fclose(fw);
+    return encrst;
+}
+
+
+#ifdef TEST
+int main(int argc,char** argv){
+    if(argc<4){
+        printf("gaes enc|dec filein fileout\n");
+        return 0;
+    }
+    char k = argv[1][0];
+    if(k=='e'){
+        int rst =  mainenc(1,argv[2],argv[3]);
+        printf("====enc %s to %s rst %d\n",argv[2],argv[3],rst);
+        return rst;
+    }else if(k=='d'){
+        int rst =  mainenc(0,argv[2],argv[3]);
+        printf("====dec %s to %s rst %d\n",argv[2],argv[3],rst);
+        return rst;
+    }else{
+        printf("gaes enc|dec filein fileout\n");
+        return 0;
+    }
+}
+#endif
--- a/duix-sdk/src/main/cpp/aes/aesmain.h
+++ b/duix-sdk/src/main/cpp/aes/aesmain.h
@ -0,0 +1,16 @@
+
+#ifndef __AESMAIN_H
+#define __AESMAIN_H
+
+#include "gj_dll.h"
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+int mainenc(int enc,char* infn,char* outfn);
+
+#ifdef __cplusplus
+}
+#endif
+#endif
+
--- a/duix-sdk/src/main/cpp/aes/base64.c
+++ b/duix-sdk/src/main/cpp/aes/base64.c
@ -0,0 +1,164 @@
+/* This is a public domain base64 implementation written by WEI Zhicheng. */
+
+#include "base64.h"
+
+#define BASE64_PAD '='
+#define BASE64DE_FIRST '+'
+#define BASE64DE_LAST 'z'
+
+/* BASE 64 encode table */
+static const char base64en[] = {
+	'A', 'B', 'C', 'D', 'E', 'F', 'G', 'H',
+	'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P',
+	'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X',
+	'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f',
+	'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n',
+	'o', 'p', 'q', 'r', 's', 't', 'u', 'v',
+	'w', 'x', 'y', 'z', '0', '1', '2', '3',
+	'4', '5', '6', '7', '8', '9', '+', '/',
+};
+
+/* ASCII order for BASE 64 decode, 255 in unused character */
+static const unsigned char base64de[] = {
+	/* nul, soh, stx, etx, eot, enq, ack, bel, */
+	   255, 255, 255, 255, 255, 255, 255, 255,
+
+	/*  bs,  ht,  nl,  vt,  np,  cr,  so,  si, */
+	   255, 255, 255, 255, 255, 255, 255, 255,
+
+	/* dle, dc1, dc2, dc3, dc4, nak, syn, etb, */
+	   255, 255, 255, 255, 255, 255, 255, 255,
+
+	/* can,  em, sub, esc,  fs,  gs,  rs,  us, */
+	   255, 255, 255, 255, 255, 255, 255, 255,
+
+	/*  sp, '!', '"', '#', '$', '%', '&', ''', */
+	   255, 255, 255, 255, 255, 255, 255, 255,
+
+	/* '(', ')', '*', '+', ',', '-', '.', '/', */
+	   255, 255, 255,  62, 255, 255, 255,  63,
+
+	/* '0', '1', '2', '3', '4', '5', '6', '7', */
+	    52,  53,  54,  55,  56,  57,  58,  59,
+
+	/* '8', '9', ':', ';', '<', '=', '>', '?', */
+	    60,  61, 255, 255, 255, 255, 255, 255,
+
+	/* '@', 'A', 'B', 'C', 'D', 'E', 'F', 'G', */
+	   255,   0,   1,  2,   3,   4,   5,    6,
+
+	/* 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', */
+	     7,   8,   9,  10,  11,  12,  13,  14,
+
+	/* 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', */
+	    15,  16,  17,  18,  19,  20,  21,  22,
+
+	/* 'X', 'Y', 'Z', '[', '\', ']', '^', '_', */
+	    23,  24,  25, 255, 255, 255, 255, 255,
+
+	/* '`', 'a', 'b', 'c', 'd', 'e', 'f', 'g', */
+	   255,  26,  27,  28,  29,  30,  31,  32,
+
+	/* 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', */
+	    33,  34,  35,  36,  37,  38,  39,  40,
+
+	/* 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', */
+	    41,  42,  43,  44,  45,  46,  47,  48,
+
+	/* 'x', 'y', 'z', '{', '|', '}', '~', del, */
+	    49,  50,  51, 255, 255, 255, 255, 255
+};
+
+unsigned int
+gjbase64_encode(const unsigned char *in, unsigned int inlen, char *out)
+{
+	int s;
+	unsigned int i;
+	unsigned int j;
+	unsigned char c;
+	unsigned char l;
+
+	s = 0;
+	l = 0;
+	for (i = j = 0; i < inlen; i++) {
+		c = in[i];
+
+		switch (s) {
+		case 0:
+			s = 1;
+			out[j++] = base64en[(c >> 2) & 0x3F];
+			break;
+		case 1:
+			s = 2;
+			out[j++] = base64en[((l & 0x3) << 4) | ((c >> 4) & 0xF)];
+			break;
+		case 2:
+			s = 0;
+			out[j++] = base64en[((l & 0xF) << 2) | ((c >> 6) & 0x3)];
+			out[j++] = base64en[c & 0x3F];
+			break;
+		}
+		l = c;
+	}
+
+	switch (s) {
+	case 1:
+		out[j++] = base64en[(l & 0x3) << 4];
+		out[j++] = BASE64_PAD;
+		out[j++] = BASE64_PAD;
+		break;
+	case 2:
+		out[j++] = base64en[(l & 0xF) << 2];
+		out[j++] = BASE64_PAD;
+		break;
+	}
+
+	out[j] = 0;
+
+	return j;
+}
+
+unsigned int
+gjbase64_decode(const char *in, unsigned int inlen, unsigned char *out)
+{
+	unsigned int i;
+	unsigned int j;
+	unsigned char c;
+
+	if (inlen & 0x3) {
+		return 0;
+	}
+
+	for (i = j = 0; i < inlen; i++) {
+		if (in[i] == BASE64_PAD) {
+			break;
+		}
+		if (in[i] < BASE64DE_FIRST || in[i] > BASE64DE_LAST) {
+			return 0;
+		}
+
+		c = base64de[(unsigned char)in[i]];
+		if (c == 255) {
+			return 0;
+		}
+
+		switch (i & 0x3) {
+		case 0:
+			out[j] = (c << 2) & 0xFF;
+			break;
+		case 1:
+			out[j++] |= (c >> 4) & 0x3;
+			out[j] = (c & 0xF) << 4;
+			break;
+		case 2:
+			out[j++] |= (c >> 2) & 0xF;
+			out[j] = (c & 0x3) << 6;
+			break;
+		case 3:
+			out[j++] |= c;
+			break;
+		}
+	}
+
+	return j;
+}
--- a/duix-sdk/src/main/cpp/aes/base64.h
+++ b/duix-sdk/src/main/cpp/aes/base64.h
@ -0,0 +1,29 @@
+#ifndef BASE64_H
+#define BASE64_H
+
+#define BASE64_ENCODE_OUT_SIZE(s) ((unsigned int)((((s) + 2) / 3) * 4 + 1))
+#define BASE64_DECODE_OUT_SIZE(s) ((unsigned int)(((s) / 4) * 3))
+
+#ifdef __cplusplus
+extern "C"{
+#endif
+
+/*
+ * out is null-terminated encode string.
+ * return values is out length, exclusive terminating `\0'
+ */
+unsigned int
+gjbase64_encode(const unsigned char *in, unsigned int inlen, char *out);
+
+/*
+ * return values is out length
+ */
+unsigned int
+gjbase64_decode(const char *in, unsigned int inlen, unsigned char *out);
+
+#ifdef __cplusplus
+}
+#endif
+
+
+#endif /* BASE64_H */
--- a/duix-sdk/src/main/cpp/aes/cbc128.c
+++ b/duix-sdk/src/main/cpp/aes/cbc128.c
@ -0,0 +1,161 @@
+/*
+ * Copyright 2008-2016 The OpenSSL Project Authors. All Rights Reserved.
+ *
+ * Licensed under the OpenSSL license (the "License").  You may not use
+ * this file except in compliance with the License.  You can obtain a copy
+ * in the file LICENSE in the source distribution or at
+ * https://www.openssl.org/source/license.html
+ */
+
+// #include <openssl/crypto.h>
+#include "modes.h"
+#include <string.h>
+
+#if !defined(STRICT_ALIGNMENT) && !defined(PEDANTIC)
+# define STRICT_ALIGNMENT 0
+#endif
+
+void CRYPTO_cbc128_encrypt(const unsigned char *in, unsigned char *out,
+                           size_t len, const void *key,
+                           unsigned char ivec[16], block128_f block)
+{
+    size_t n;
+    const unsigned char *iv = ivec;
+
+    if (len == 0)
+        return;
+
+#if !defined(OPENSSL_SMALL_FOOTPRINT)
+    if (STRICT_ALIGNMENT &&
+        ((size_t)in | (size_t)out | (size_t)ivec) % sizeof(size_t) != 0) {
+        while (len >= 16) {
+            for (n = 0; n < 16; ++n)
+                out[n] = in[n] ^ iv[n];
+            (*block) (out, out, key);
+            iv = out;
+            len -= 16;
+            in += 16;
+            out += 16;
+        }
+    } else {
+        while (len >= 16) {
+            for (n = 0; n < 16; n += sizeof(size_t))
+                *(size_t *)(out + n) =
+                    *(size_t *)(in + n) ^ *(size_t *)(iv + n);
+            (*block) (out, out, key);
+            iv = out;
+            len -= 16;
+            in += 16;
+            out += 16;
+        }
+    }
+#endif
+    while (len) {
+        for (n = 0; n < 16 && n < len; ++n)
+            out[n] = in[n] ^ iv[n];
+        for (; n < 16; ++n)
+            out[n] = iv[n];
+        (*block) (out, out, key);
+        iv = out;
+        if (len <= 16)
+            break;
+        len -= 16;
+        in += 16;
+        out += 16;
+    }
+    memcpy(ivec, iv, 16);
+}
+
+void CRYPTO_cbc128_decrypt(const unsigned char *in, unsigned char *out,
+                           size_t len, const void *key,
+                           unsigned char ivec[16], block128_f block)
+{
+    size_t n;
+    union {
+        size_t t[16 / sizeof(size_t)];
+        unsigned char c[16];
+    } tmp;
+
+    if (len == 0)
+        return;
+
+#if !defined(OPENSSL_SMALL_FOOTPRINT)
+    if (in != out) {
+        const unsigned char *iv = ivec;
+
+        if (STRICT_ALIGNMENT &&
+            ((size_t)in | (size_t)out | (size_t)ivec) % sizeof(size_t) != 0) {
+            while (len >= 16) {
+                (*block) (in, out, key);
+                for (n = 0; n < 16; ++n)
+                    out[n] ^= iv[n];
+                iv = in;
+                len -= 16;
+                in += 16;
+                out += 16;
+            }
+        } else if (16 % sizeof(size_t) == 0) { /* always true */
+            while (len >= 16) {
+                size_t *out_t = (size_t *)out, *iv_t = (size_t *)iv;
+
+                (*block) (in, out, key);
+                for (n = 0; n < 16 / sizeof(size_t); n++)
+                    out_t[n] ^= iv_t[n];
+                iv = in;
+                len -= 16;
+                in += 16;
+                out += 16;
+            }
+        }
+        memcpy(ivec, iv, 16);
+    } else {
+        if (STRICT_ALIGNMENT &&
+            ((size_t)in | (size_t)out | (size_t)ivec) % sizeof(size_t) != 0) {
+            unsigned char c;
+            while (len >= 16) {
+                (*block) (in, tmp.c, key);
+                for (n = 0; n < 16; ++n) {
+                    c = in[n];
+                    out[n] = tmp.c[n] ^ ivec[n];
+                    ivec[n] = c;
+                }
+                len -= 16;
+                in += 16;
+                out += 16;
+            }
+        } else if (16 % sizeof(size_t) == 0) { /* always true */
+            while (len >= 16) {
+                size_t c, *out_t = (size_t *)out, *ivec_t = (size_t *)ivec;
+                const size_t *in_t = (const size_t *)in;
+
+                (*block) (in, tmp.c, key);
+                for (n = 0; n < 16 / sizeof(size_t); n++) {
+                    c = in_t[n];
+                    out_t[n] = tmp.t[n] ^ ivec_t[n];
+                    ivec_t[n] = c;
+                }
+                len -= 16;
+                in += 16;
+                out += 16;
+            }
+        }
+    }
+#endif
+    while (len) {
+        unsigned char c;
+        (*block) (in, tmp.c, key);
+        for (n = 0; n < 16 && n < len; ++n) {
+            c = in[n];
+            out[n] = tmp.c[n] ^ ivec[n];
+            ivec[n] = c;
+        }
+        if (len <= 16) {
+            for (; n < 16; ++n)
+                ivec[n] = in[n];
+            break;
+        }
+        len -= 16;
+        in += 16;
+        out += 16;
+    }
+}
--- a/duix-sdk/src/main/cpp/aes/gaes_stream.cc
+++ b/duix-sdk/src/main/cpp/aes/gaes_stream.cc
@ -0,0 +1,213 @@
+
+#include "gaes_stream.h"
+
+#include <cstring>
+#include <iostream>
+#include <fstream>
+#include <vector>
+#include <cstdio>
+#include <cstdlib>
+#include "gj_aes.h"
+
+
+class GaesIStreamBuf final: public std::streambuf
+{
+private:
+	char *m_inbuf;
+	size_t m_inbufsize;
+	bool m_owns_inbuf;
+	char *m_leftbuf;
+
+    FILE *file;
+    uint64_t cur_size;
+    uint64_t file_size;
+    gj_aesc_t* aesc ;
+protected:
+	virtual std::streambuf* setbuf(char *s, std::streamsize n){
+	    setg(0, 0, 0);
+	    if (m_owns_inbuf) {
+	        delete [] m_inbuf;
+	    }
+	    m_inbufsize = n;
+	    if (s) {
+	        m_inbuf = s;
+	        m_owns_inbuf = false;
+	    } else {
+	        m_inbuf = new char[m_inbufsize];
+	        m_leftbuf = new char[m_inbufsize];
+	        m_owns_inbuf = true;
+        }
+	    return this;
+    }
+
+	virtual int sync(){
+	    int result = 0;
+	    return result;
+    }
+
+    virtual int underflow() override{
+	    int __c = traits_type::eof();
+        if (!file) return __c;
+        if(cur_size>=file_size){
+            printf("===eof %ld ===%ld\n",cur_size,file_size);
+            return __c;
+        }
+	    bool initial = false;
+	    if (eback() == 0) {
+	        setg(m_inbuf, m_inbuf + m_inbufsize, m_inbuf + m_inbufsize);
+	        initial = true;
+        }
+	    const size_t unget_sz = initial ? 0 : std::min<size_t>((egptr() - eback()) / 2, 4);
+	    if (gptr() == egptr()) {
+	        memmove(eback(), egptr() - unget_sz, unget_sz);
+	        size_t nmemb = static_cast<size_t>(egptr() - eback() - unget_sz);
+            char* pdst = eback() + unget_sz;
+            int modb = nmemb % 16;
+            size_t leftb = nmemb - modb;
+            char* pbuf = m_leftbuf;
+            size_t leftf = file_size - cur_size;
+            if(leftb>leftf)leftb=leftf;
+            memset(pbuf,0,m_inbufsize);
+            size_t rd = fread(pbuf, 1, leftb, file);
+            //printf("%d-%ld-%ld----------------%ld--%ld#\n",cur_size,file_size,modb,nmemb,rd);
+	        //ssize_t readed = read(m_fd, eback() + unget_sz, nmemb);
+            if(rd>0){
+                cur_size += rd;
+                int cnt = leftb /16;
+                int k;
+                for(k=0;k<cnt;k++){
+                    int outlen = 0;
+                    do_aesc(aesc,pbuf,16,pdst,&outlen);
+                    pbuf += 16;
+                    pdst += 16;
+                }
+		        setg(eback(), eback() + unget_sz, eback() + unget_sz + rd);
+		        __c = traits_type::to_int_type(*gptr());
+            }
+	    } else {
+	        __c = traits_type::to_int_type(*gptr());
+        }
+	    return __c;
+    }
+public:
+    GaesIStreamBuf(std::string& filename) :m_inbuf(0), m_inbufsize(0), m_owns_inbuf(false){
+	    setbuf(0, 1024);
+        cur_size = 0;
+        file = fopen(filename.c_str(), "rb");
+        fseek(file, 0, SEEK_END);
+        file_size = ftell(file); //获取音频文件大小
+        fseek(file, 0, SEEK_SET);
+        char* key = "yymrjzbwyrbjszrk";
+        char* aiv = "yymrjzbwyrbjszrk";
+        init_aesc(key,aiv,0,&this->aesc);
+        char head[50];
+        memset(head,0,50);
+        uint64_t rst = fread(head,1,8,file);
+        rst = fread(&cur_size,1,8,file);
+        printf("===head %s size %ld\n",head,cur_size);
+        rst = fread(head,1,16,file);
+        cur_size = 32;
+    }
+
+    ~GaesIStreamBuf(){
+        close();
+	    if (m_owns_inbuf) {
+	        delete[] m_inbuf;
+	    }
+    }
+
+    void close(){
+        if(aesc){
+            free_aesc(&this->aesc);
+        }
+        if (file){
+            fclose(file);
+            file = NULL;
+        }
+    }
+};
+
+
+
+GaesIStream::GaesIStream(std::string filename):
+    std::istream(new GaesIStreamBuf(filename)){
+}
+
+GaesIStream::~GaesIStream()
+{
+    delete rdbuf();
+}
+
+#ifdef TEST
+int maindec(int argc,char** argv){
+    std::string filename(argv[1]);// = "test.enc";
+    //std::string filename = "final.mdlenc";
+    GaesIStream fin(filename);
+    //std::string fn2 = "final.mdldec";
+    std::string fn2(argv[2]);// = "test.dec";
+    std::ofstream fout(fn2,std::ios::binary);
+
+    char buf[1024];
+    int rd = 0;
+    while(!fin.eof()){
+    //while((rd = fin.read(buf,16))>0){
+        //printf("===rd %ld\n",rd);
+        fin.read(buf,16);
+        fout.write(buf,16);
+
+    }
+    //char ch;
+    //while (fin.get(ch)) {
+        //printf("+");
+        //fout << ch;
+    //}
+    return 0;
+}
+
+
+
+int mainenc(int argc,char** argv){
+    char result[255] ;
+    memset(result,0,255);
+    char* key = "yymrjzbwyrbjszrk";
+    char* aiv = "yymrjzbwyrbjszrk";
+    int base64 = 1;
+    int outlen = 0;
+    gj_aesc_t* aesc = NULL;
+    init_aesc(key,aiv,1,&aesc);
+    char* fn1 = argv[1];
+    char* fn2 = argv[2];
+    FILE* fr = fopen(fn1,"rb");
+    FILE* fw = fopen(fn2,"wb");
+    fwrite("abcdefgh",1,8,fw);
+    uint64_t size = 0;
+    fwrite(&size,1,8,fw);
+    fwrite(&size,1,8,fw);
+    fwrite(&size,1,8,fw);
+    while(!feof(fr)){
+        char data[16];
+        memset(data,0,16);
+        uint64_t rst = fread(data,1,16,fr);
+        printf("===rst %d\n",rst);
+        if(rst){
+            size +=rst;
+            do_aesc(aesc,data,16,result,&outlen);
+            printf("===out %d\n",outlen);
+            fwrite(result,1,16,fw);
+        }
+    }
+    fseek(fw,8,0);
+    fwrite(&size,1,8,fw);
+    fclose(fr);
+    fclose(fw);
+    return 0;
+}
+
+int main(int argc,char** argv){
+    if(argc<4){
+        return mainenc(argc,argv);
+    }else{
+        return maindec(argc,argv);
+    }
+}
+#endif
--- a/duix-sdk/src/main/cpp/aes/gaes_stream.h
+++ b/duix-sdk/src/main/cpp/aes/gaes_stream.h
@ -0,0 +1,22 @@
+#ifndef COMPRESSED_STREAMS_ZSTD_STREAM_H
+#define COMPRESSED_STREAMS_ZSTD_STREAM_H
+
+#include <iostream>
+
+
+
+
+
+class GaesIStream: public std::istream
+{
+public:
+    GaesIStream(std::string filename);
+
+    virtual ~GaesIStream();
+};
+
+
+
+
+
+#endif // COMPRESSED_STREAMS_ZSTD_STREAM_H
--- a/duix-sdk/src/main/cpp/aes/gaesmain
+++ b/duix-sdk/src/main/cpp/aes/gaesmain
--- a/duix-sdk/src/main/cpp/aes/gj_aes.c
+++ b/duix-sdk/src/main/cpp/aes/gj_aes.c
@ -0,0 +1,69 @@
+#include <stdlib.h>
+#include <string.h>
+#include "gj_aes.h"
+#include "base64.h"
+
+#include "aes.h"
+
+
+struct gj_aesc_s{
+    char key[16];
+    char iv[16];
+    int enc;
+	AES_KEY *aeskey;
+};
+
+int free_aesc(gj_aesc_t** paesc){
+    if(!paesc||!*paesc)return -1;
+    if((*paesc)->aeskey)free((*paesc)->aeskey);
+    free(*paesc);
+    *paesc = NULL;
+    return 0;
+}
+
+
+int init_aesc(char* key,char* iv,int enc,gj_aesc_t** paesc){
+    if(strlen(key)!=16) return -1;
+    if(strlen(iv)!=16) return -2;
+    gj_aesc_t* aesc = (gj_aesc_t*)malloc(sizeof(gj_aesc_t));
+    int k;
+    for(k=0;k<16;k++){
+        aesc->key[k]=key[k];
+        aesc->iv[k]=iv[k];
+    }
+    aesc->aeskey = (AES_KEY*)malloc(sizeof(AES_KEY));
+    aesc->enc = enc;
+    if(enc){
+	    AES_set_encrypt_key((const unsigned char*)aesc->key, 128, aesc->aeskey);
+    }else{
+	    AES_set_decrypt_key((const unsigned char*)aesc->key, 128, aesc->aeskey);
+    }
+    *paesc = aesc;
+    return 0;
+}
+
+int do_aesc(gj_aesc_t* aesc,char* in,int inlen,char* out,int* outlen){
+    char* psrc = in;
+    char* pdest = out;
+    int cnt = 0;
+    int left=inlen;
+    while(left>0){
+	    AES_cbc_encrypt((const unsigned char*)psrc,(unsigned char*)pdest,16,aesc->aeskey,(unsigned char*)aesc->iv,aesc->enc);
+        psrc += 16;
+        pdest += 16;
+        left -= 16;
+        cnt += 16;
+    }
+    *outlen = cnt;
+    return 0;
+}
+
+int do_base64(int enc,char* in,int inlen,char* out,int* outlen){
+    if(enc){
+        gjbase64_encode((unsigned char*)in,inlen,out);
+        *outlen = strlen(out);
+    }else{
+        *outlen = gjbase64_decode(in,inlen,(unsigned char*)out);
+    }
+    return 0;
+}
--- a/duix-sdk/src/main/cpp/aes/gj_aes.h
+++ b/duix-sdk/src/main/cpp/aes/gj_aes.h
@ -0,0 +1,22 @@
+#ifndef __GJ_AES_H__
+#define __GJ_AES_H__
+
+#include "gj_dll.h"
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+typedef struct gj_aesc_s gj_aesc_t;
+
+GJLIBAPI int free_aesc(gj_aesc_t** paesc);
+GJLIBAPI int init_aesc(char* key,char* iv,int enc,gj_aesc_t** paesc);
+
+GJLIBAPI int do_aesc(gj_aesc_t* aesc,char* in,int inlen,char* out,int* outlen);
+
+GJLIBAPI int do_base64(int enc,char* in,int inlen,char* out,int* outlen);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
--- a/duix-sdk/src/main/cpp/aes/gj_dll.h
+++ b/duix-sdk/src/main/cpp/aes/gj_dll.h
@ -0,0 +1,21 @@
+#ifndef __GJ_DLL_H__
+#define __GJ_DLL_H__
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+#define GJLIB_EXPORT 1
+#if defined(GJLIB_EXPORT)
+    #if defined _WIN32 || defined __CYGWIN__
+        #define GJLIBAPI __declspec(dllexport)
+    #else
+        #define GJLIBAPI __attribute__((visibility("default")))
+    #endif
+#else
+    #define GJLIBAPI
+#endif
+
+#ifdef __cplusplus
+}
+#endif
+#endif
--- a/duix-sdk/src/main/cpp/aes/makefile
+++ b/duix-sdk/src/main/cpp/aes/makefile
@ -0,0 +1,3 @@
+all:
+	g++ -fPIC -o gjaesmain -g aesmain.c \
+		aes_cbc.c aes_core.c aes_ecb.c cbc128.c base64.c gj_aes.c -lm --std=c++11 -I.   -DTEST
--- a/duix-sdk/src/main/cpp/aes/modes.h
+++ b/duix-sdk/src/main/cpp/aes/modes.h
@ -0,0 +1,22 @@
+#ifndef HEADER_MODES_H
+# define HEADER_MODES_H
+
+# include <stddef.h>
+
+typedef void (*block128_f) (const unsigned char in[16],
+                            unsigned char out[16], const void *key);
+
+typedef void (*cbc128_f) (const unsigned char *in, unsigned char *out,
+                          size_t len, const void *key,
+                          unsigned char ivec[16], int enc);
+
+void CRYPTO_cbc128_encrypt(const unsigned char *in, unsigned char *out,
+                           size_t len, const void *key,
+                           unsigned char ivec[16], block128_f block);
+void CRYPTO_cbc128_decrypt(const unsigned char *in, unsigned char *out,
+                           size_t len, const void *key,
+                           unsigned char ivec[16], block128_f block);
+
+
+
+#endif
--- a/duix-sdk/src/main/cpp/android/DuixJni.cpp
+++ b/duix-sdk/src/main/cpp/android/DuixJni.cpp
@ -0,0 +1,243 @@
+#include <android/asset_manager_jni.h>
+#include <android/native_window_jni.h>
+#include <android/native_window.h>
+#include <android/log.h>
+#include <jni.h>
+#include <string>
+#include <vector>
+#include <unistd.h>
+#include "gjsimp.h"
+#include "JniHelper.h"
+#include "aesmain.h"
+#include "jmat.h"
+#include "Log.h"
+
+#if __ARM_NEON
+#include <arm_neon.h>
+#endif // __ARM_NEON
+       //
+       //
+#define TAG  "tooken"
+#ifdef DEBUGME
+#define JNIEXPORT 
+#define JNI_OnLoad
+#define jint int
+#define jlong long
+#define jstring string
+#define JNICALL 
+#define JavaVM void
+#define LOGI(...)
+#define JNIEnv void
+#define jobject void*
+#endif
+extern "C" {
+
+  static dhduix_t* g_digit = 0;
+  static JMat*    g_gpgmat = NULL;
+  static int  g_width = 540;
+  static int  g_height = 960;
+  static int  g_taskid = -1;
+
+  JNIEXPORT jint JNI_OnLoad(JavaVM *vm, void *reserved) {
+    LOGD(TAG, "JNI_OnLoad");
+    //g_digit = new GDigit(g_width,g_height,g_msgcb);
+    JniHelper::sJavaVM = vm;
+    return JNI_VERSION_1_4;
+  }
+
+  JNIEXPORT void JNI_OnUnload(JavaVM *vm, void *reserved) {
+    LOGI(TAG, "unload");
+    if(g_digit){
+      dhduix_free(g_digit);
+      g_digit = nullptr;
+    }
+  }
+
+  static std::string getStringUTF(JNIEnv *env, jstring obj) {
+    char *c_str = (char *) env->GetStringUTFChars(obj, nullptr);
+    std::string tmpString = std::string(c_str);
+    env->ReleaseStringUTFChars(obj, c_str);
+    return tmpString;
+  }
+
+
+  JNIEXPORT jint JNICALL Java_ai_guiji_duix_DuixNcnn_alloc(JNIEnv *env, jobject thiz,
+      jint taskid,jint mincalc,jint width,jint height){
+    LOGI(TAG, "create");
+    g_taskid = taskid;
+    dhduix_alloc(&g_digit,mincalc,width,height);
+    return 0;
+  }
+
+  JNIEXPORT jint JNICALL Java_ai_guiji_duix_DuixNcnn_free(JNIEnv *env, jobject thiz,jint taskid){
+    if(g_taskid==taskid){
+      dhduix_free(g_digit);
+      g_digit = nullptr;
+    }
+    return 0;
+  }
+
+  JNIEXPORT jint JNICALL Java_ai_guiji_duix_DuixNcnn_initPcmex(JNIEnv *env, jobject thiz, 
+      jint maxsize,jint minoff,jint minblock,jint maxblock,jint rgb){
+    if(!g_digit)return -1;
+    int rst = dhduix_initPcmex(g_digit,maxsize,minoff,minblock,maxblock,rgb);
+    return rst;
+  }
+
+  JNIEXPORT jint JNICALL Java_ai_guiji_duix_DuixNcnn_initWenet(JNIEnv *env, jobject thiz,
+      jstring fnwenet){
+    if(!g_digit)return -1;
+    std::string str = getStringUTF(env,fnwenet);
+    char* ps = (char*)(str.c_str());
+    int rst = dhduix_initWenet(g_digit,ps);
+    return rst;
+  }
+
+  JNIEXPORT jint JNICALL Java_ai_guiji_duix_DuixNcnn_initMunet(JNIEnv *env, jobject thiz,
+      jstring fnparam,jstring fnbin,jstring fnmask){
+    if(!g_digit)return -1;
+    std::string sparam = getStringUTF(env,fnparam);
+    std::string sbin = getStringUTF(env,fnbin);
+    std::string smask = getStringUTF(env,fnmask);
+    int rst = dhduix_initMunet(g_digit,(char*)sparam.c_str(),(char*)sbin.c_str(),(char*)smask.c_str());
+    return rst;
+  }
+
+  JNIEXPORT jint JNICALL Java_ai_guiji_duix_DuixNcnn_initMunetex(JNIEnv *env, jobject thiz,
+      jstring fnparam,jstring fnbin,jstring fnmask,jint kind){
+    if(!g_digit)return -1;
+    std::string sparam = getStringUTF(env,fnparam);
+    std::string sbin = getStringUTF(env,fnbin);
+    std::string smask = getStringUTF(env,fnmask);
+    int rst = dhduix_initMunetex(g_digit,(char*)sparam.c_str(),(char*)sbin.c_str(),(char*)smask.c_str(),kind?kind:168);
+    return rst;
+  }
+
+  JNIEXPORT jlong JNICALL Java_ai_guiji_duix_DuixNcnn_newsession(JNIEnv *env, jobject thiz){
+    if(!g_digit)return -1;
+    uint64_t sessid = dhduix_newsession(g_digit);
+    return (jlong)sessid;
+  }
+
+  JNIEXPORT jint JNICALL Java_ai_guiji_duix_DuixNcnn_pushpcm(JNIEnv *env, jobject thiz, 
+      jlong sessid,jbyteArray arrbuf,jint size,jint kind){
+    if(!g_digit)return -1;
+    jbyte *pcmbuf = (jbyte *) env->GetPrimitiveArrayCritical(arrbuf, 0);
+    uint64_t sid = sessid;
+    int rst = dhduix_pushpcm(g_digit,sid,(char*)pcmbuf,size,kind);
+    env->ReleasePrimitiveArrayCritical(arrbuf,pcmbuf, 0);
+    return rst;
+  }
+
+  JNIEXPORT jint JNICALL Java_ai_guiji_duix_DuixNcnn_finsession(JNIEnv *env, jobject thiz,jlong sessid){
+    if(!g_digit)return -1;
+    uint64_t sid = sessid;
+    return dhduix_finsession(g_digit,sid);
+  }
+
+  JNIEXPORT jint JNICALL Java_ai_guiji_duix_DuixNcnn_consession(JNIEnv *env, jobject thiz,jlong sessid){
+    if(!g_digit)return -1;
+    uint64_t sid = sessid;
+    return dhduix_consession(g_digit,sid);
+  }
+
+  JNIEXPORT jint JNICALL Java_ai_guiji_duix_DuixNcnn_allcnt(JNIEnv *env, jobject thiz,jlong sessid){
+    if(!g_digit)return -1;
+    uint64_t sid = sessid;
+    return dhduix_allcnt(g_digit,sid);
+  }
+
+  JNIEXPORT jint JNICALL Java_ai_guiji_duix_DuixNcnn_readycnt(JNIEnv *env, jobject thiz,jlong sessid){
+    if(!g_digit)return -1;
+    uint64_t sid = sessid;
+    return dhduix_readycnt(g_digit,sid);
+  }
+
+  JNIEXPORT jint JNICALL Java_ai_guiji_duix_DuixNcnn_fileload(JNIEnv* env, jobject thiz,
+      jstring picfile, jstring mskfile,jint width,jint height,
+      jbyteArray arrpic,jbyteArray arrmsk,jint bursize){
+  //
+    std::string s_pic = getStringUTF(env,picfile);
+    std::string s_msk = getStringUTF(env,mskfile);
+    jbyte *picbuf = (jbyte *) env->GetPrimitiveArrayCritical(arrpic, 0);
+    JMat* mat_pic = new JMat(width,height,(uint8_t*)picbuf);
+    mat_pic->loadjpg(s_pic,1);
+    env->ReleasePrimitiveArrayCritical( arrpic,picbuf, 0);
+    delete mat_pic;
+
+    if(s_msk.length()){
+        jbyte *mskbuf = (jbyte *) env->GetPrimitiveArrayCritical(arrmsk, 0);
+        JMat* mat_msk = new JMat(width,height,(uint8_t*)mskbuf);
+        mat_msk->loadjpg(s_msk,1);
+        env->ReleasePrimitiveArrayCritical( arrmsk,mskbuf, 0);
+        delete mat_msk;
+    }
+    return 0;
+  }
+
+  JNIEXPORT jint JNICALL Java_ai_guiji_duix_DuixNcnn_bufrst(JNIEnv* env, jobject thiz,
+      jlong sessid, jintArray arrbox, jint inx,
+      jbyteArray arrimg,jint imgsize){
+    if(!g_digit)return -1;
+    uint64_t sid = sessid;
+    jint *boxData = (jint*) env->GetPrimitiveArrayCritical( arrbox, 0);
+    jbyte *imgbuf = (jbyte*) env->GetPrimitiveArrayCritical(arrimg, 0);
+    int bnfinx = inx;
+    int rst = dhduix_simpinx(g_digit,sid,(uint8_t*)imgbuf, 0,0, 
+        (int*)boxData,NULL,NULL,bnfinx);
+    env->ReleasePrimitiveArrayCritical( arrimg,imgbuf, 0);
+    env->ReleasePrimitiveArrayCritical( arrbox, boxData, 0);
+    return rst;
+  }
+
+  JNIEXPORT jint JNICALL Java_ai_guiji_duix_DuixNcnn_filerst(JNIEnv* env, jobject thiz,
+      jlong sessid,jstring picfile, jstring mskfile,
+      jintArray arrbox, jstring fgfile,jint inx,
+      jbyteArray arrimg,jbyteArray arrmsk,jint imgsize){
+    if(!g_digit)return -1;
+    uint64_t sid = sessid;
+    std::string s_pic = getStringUTF(env,picfile);
+    std::string s_msk = getStringUTF(env,mskfile);
+    std::string s_fg = getStringUTF(env,fgfile);
+    jint *boxData = (jint*) env->GetPrimitiveArrayCritical( arrbox, 0);
+    jbyte *imgbuf = (jbyte*) env->GetPrimitiveArrayCritical(arrimg, 0);
+    jbyte *mskbuf = (jbyte*) env->GetPrimitiveArrayCritical(arrmsk, 0);
+    int rst = dhduix_fileinx(g_digit,sid,
+        (char*)s_pic.c_str(),(int*)boxData,
+        (char*)s_msk.c_str(),(char*)s_fg.c_str(),
+        inx,(char*)imgbuf,(char*)mskbuf,imgsize);
+    env->ReleasePrimitiveArrayCritical( arrimg,imgbuf, 0);
+    env->ReleasePrimitiveArrayCritical( arrmsk,mskbuf, 0);
+    env->ReleasePrimitiveArrayCritical( arrbox, boxData, 0);
+    return rst;
+  }
+
+    JNIEXPORT jint JNICALL
+        Java_ai_guiji_duix_DuixNcnn_startgpg(JNIEnv *env, jobject thiz, jstring picfn,jstring gpgfn){
+            std::string s_pic = getStringUTF(env,picfn);
+            std::string s_gpg = getStringUTF(env,gpgfn);
+            if(!g_gpgmat)g_gpgmat = new JMat();
+            int rst = g_gpgmat->loadjpg(s_pic);
+            if(rst)return rst;
+            rst = g_gpgmat->savegpg(s_gpg);
+            return rst;
+        }
+
+    JNIEXPORT jint JNICALL
+        Java_ai_guiji_duix_DuixNcnn_processmd5(JNIEnv *env, jobject thiz, jint kind,jstring infn,jstring outfn){
+            std::string s_in = getStringUTF(env,infn);
+            std::string s_out = getStringUTF(env,outfn);
+            int rst = mainenc(kind,(char*)s_in.c_str(),(char*)s_out.c_str());
+            return rst;
+        }
+
+    JNIEXPORT jint JNICALL
+        Java_ai_guiji_duix_DuixNcnn_stopgpg(JNIEnv *env, jobject thiz){
+            if(g_gpgmat){
+                delete g_gpgmat;
+                g_gpgmat = NULL;
+            }
+            return 0;
+    }
+}
+
--- a/duix-sdk/src/main/cpp/android/JniHelper.cpp
+++ b/duix-sdk/src/main/cpp/android/JniHelper.cpp
@ -0,0 +1,384 @@
+#include <malloc.h>
+#include "JniHelper.h"
+#include "Log.h"
+
+#define TAG "JniHelper"
+
+using namespace std;
+
+JavaVM *JniHelper::sJavaVM = nullptr;
+
+JNIEnv *JniHelper::getJNIEnv() {
+    if (sJavaVM == nullptr) {
+        LOGE(TAG, "sJavaVM is nullptr");
+        return nullptr;
+    }
+
+    JNIEnv *env = nullptr;
+    bool attached = false;
+    switch (sJavaVM->GetEnv((void **) &env, JNI_VERSION_1_4)) {
+        case JNI_OK:
+            break;
+        case JNI_EDETACHED:
+            if (sJavaVM->AttachCurrentThread(&env, nullptr) != 0) {
+                LOGE(TAG, "Could not attach current thread");
+            }
+            attached = true;
+            break;
+        case JNI_EVERSION:
+            LOGE(TAG, "Invalid java version");
+            break;
+        default:
+            break;
+    }
+
+    if (attached) {
+        sJavaVM->DetachCurrentThread();
+    }
+
+    return env;
+}
+
+bool JniHelper::attachCurrentThread() {
+    if (sJavaVM == nullptr) {
+        LOGE(TAG, "sJavaVM is nullptr");
+        return false;
+    }
+
+    JNIEnv *env = nullptr;
+    bool attached = false;
+    switch (sJavaVM->GetEnv((void **) &env, JNI_VERSION_1_4)) {
+        case JNI_OK:
+            break;
+        case JNI_EDETACHED:
+            if (sJavaVM->AttachCurrentThread(&env, nullptr) != 0) {
+                LOGE(TAG, "Could not attach current thread");
+            } else {
+                attached = true;
+            }
+            break;
+        case JNI_EVERSION:
+            LOGE(TAG, "Invalid java version");
+            break;
+        default:
+            break;
+    }
+
+    return attached;
+}
+
+void JniHelper::detachCurrentThread() {
+    sJavaVM->DetachCurrentThread();
+}
+
+void JniHelper::throwException(JNIEnv *env, const char *className, const char *msg) {
+    jclass exception = env->FindClass(className);
+    env->ThrowNew(exception, msg);
+}
+
+jstring JniHelper::newStringUTF(JNIEnv *env, const char *data) {
+    if (!data) return nullptr;
+    jstring str = nullptr;
+    int size = strlen(data);
+    jbyteArray array = env->NewByteArray(size);
+    if (!array) {  // OutOfMemoryError exception has already been thrown.
+        LOGE(TAG, "convertString: OutOfMemoryError is thrown.");
+    } else {
+        env->SetByteArrayRegion(array, 0, size, (jbyte *) data);
+        jclass string_Clazz = env->FindClass("java/lang/String");
+        jmethodID string_initMethodID = env->GetMethodID(string_Clazz, "<init>",
+                                                         "([BLjava/lang/String;)V");
+        jstring utf = env->NewStringUTF("UTF-8");
+        str = (jstring) env->NewObject(string_Clazz, string_initMethodID, array, utf);
+        env->DeleteLocalRef(utf);
+        env->DeleteLocalRef(array);
+    }
+    return str;
+};
+
+jobject JniHelper::createByteBuffer(JNIEnv *env, unsigned char *buffer, int size) {
+    if (env == nullptr || buffer == nullptr) {
+        return nullptr;
+    }
+
+    jobject byteBuffer = env->NewDirectByteBuffer(buffer, size);
+    //byteBuffer = env->NewGlobalRef(byteBuffer);
+
+    return byteBuffer;
+}
+
+jobject JniHelper::createByteBuffer(JNIEnv *env, int size) {
+    if (env == nullptr) {
+        return nullptr;
+    }
+
+    auto buffer = static_cast<uint8_t *>(malloc(static_cast<size_t>(size)));
+    jobject byteBuffer = env->NewDirectByteBuffer(buffer, size);
+    free(buffer);
+    return byteBuffer;
+}
+
+void JniHelper::deleteLocalRef(jobject jobj) {
+    JNIEnv *env = JniHelper::getJNIEnv();
+    if (env == nullptr || jobj == nullptr) {
+        return;
+    }
+
+    env->DeleteLocalRef(jobj);
+}
+
+string JniHelper::getStringUTF(JNIEnv *env, jstring obj) {
+    char *c_str = (char *) env->GetStringUTFChars(obj, nullptr);
+    string tmpString = std::string(c_str);
+    env->ReleaseStringUTFChars(obj, c_str);
+    return tmpString;
+}
+
+char *JniHelper::getCharArrayUTF(JNIEnv *env, jstring obj) {
+    char *c_str = (char *) env->GetStringUTFChars(obj, nullptr);
+    env->ReleaseStringUTFChars(obj, c_str);
+    return c_str;
+}
+
+void JniHelper::callVoidMethod(jobject obj, jmethodID methodId) {
+    if (sJavaVM == nullptr) {
+        LOGE(TAG, "sJavaVM is nullptr");
+        return;
+    }
+
+    JNIEnv *env = nullptr;
+    bool attached = false;
+    switch (sJavaVM->GetEnv((void **) &env, JNI_VERSION_1_4)) {
+        case JNI_OK:
+            break;
+        case JNI_EDETACHED:
+            if (sJavaVM->AttachCurrentThread(&env, nullptr) != 0) {
+                LOGE(TAG, "Could not attach current thread");
+            }
+            attached = true;
+            break;
+        case JNI_EVERSION:
+            LOGE(TAG, "Invalid java version");
+            break;
+        default:
+            break;
+    }
+
+    if (env != nullptr) {
+        env->CallVoidMethod(obj, methodId);
+    }
+
+    if (attached) {
+        sJavaVM->DetachCurrentThread();
+    }
+}
+
+void JniHelper::callVoidMethod(jobject obj, jmethodID methodId, jint arg1, jint arg2, jint arg3, jint arg4) {
+    if (sJavaVM == nullptr) {
+        LOGE(TAG, "sJavaVM is nullptr");
+        return;
+    }
+
+    JNIEnv *env = nullptr;
+    bool attached = false;
+    switch (sJavaVM->GetEnv((void **) &env, JNI_VERSION_1_4)) {
+        case JNI_OK:
+            break;
+        case JNI_EDETACHED:
+            if (sJavaVM->AttachCurrentThread(&env, nullptr) != 0) {
+                LOGE(TAG, "Could not attach current thread");
+            }
+            attached = true;
+            break;
+        case JNI_EVERSION:
+            LOGE(TAG, "Invalid java version");
+            break;
+        default:
+            break;
+    }
+
+    if (env != nullptr) {
+        env->CallVoidMethod(obj, methodId, arg1, arg2, arg3, arg4);
+    }
+
+    if (attached) {
+        sJavaVM->DetachCurrentThread();
+    }
+}
+
+void
+JniHelper::callVoidMethod(jobject obj, jmethodID methodId, jint arg1, jint arg2, jint arg3,
+                          jstring arg4, jstring arg5, jobject arg6) {
+    if (sJavaVM == nullptr) {
+        LOGE(TAG, "sJavaVM is nullptr");
+        return;
+    }
+
+    JNIEnv *env = nullptr;
+    bool attached = false;
+    switch (sJavaVM->GetEnv((void **) &env, JNI_VERSION_1_4)) {
+        case JNI_OK:
+            break;
+        case JNI_EDETACHED:
+            if (sJavaVM->AttachCurrentThread(&env, nullptr) != 0) {
+                LOGE(TAG, "Could not attach current thread");
+            }
+            attached = true;
+            break;
+        case JNI_EVERSION:
+            LOGE(TAG, "Invalid java version");
+            break;
+        default:
+            break;
+    }
+
+    if (env != nullptr) {
+        env->CallVoidMethod(obj, methodId, arg1, arg2, arg3, arg4, arg5, arg6);
+    }
+
+    if (attached) {
+        sJavaVM->DetachCurrentThread();
+    }
+}
+
+int JniHelper::callIntMethod(jobject obj, jmethodID methodId, jobject arg1, jint arg2) {
+    if (sJavaVM == nullptr) {
+        LOGE(TAG, "sJavaVM is nullptr");
+        return -1;
+    }
+
+    JNIEnv *env = nullptr;
+    bool attached = false;
+    switch (sJavaVM->GetEnv((void **) &env, JNI_VERSION_1_4)) {
+        case JNI_OK:
+            break;
+        case JNI_EDETACHED:
+            if (sJavaVM->AttachCurrentThread(&env, nullptr) != 0) {
+                LOGE(TAG, "Could not attach current thread");
+            }
+            attached = true;
+            break;
+        case JNI_EVERSION:
+            LOGE(TAG, "Invalid java version");
+            break;
+        default:
+            break;
+    }
+
+    int ret = -1;
+    if (env != nullptr) {
+        ret = env->CallIntMethod(obj, methodId, arg1, arg2);
+    }
+
+    if (attached) {
+        sJavaVM->DetachCurrentThread();
+    }
+
+    return ret;
+}
+
+
+void JniHelper::callStaticVoidMethod(jclass cls, jmethodID methodId, jint arg1) {
+    if (sJavaVM == nullptr) {
+        LOGE(TAG, "sJavaVM is nullptr");
+        return;
+    }
+
+    JNIEnv *env = nullptr;
+    bool attached = false;
+    switch (sJavaVM->GetEnv((void **) &env, JNI_VERSION_1_4)) {
+        case JNI_OK:
+            break;
+        case JNI_EDETACHED:
+            if (sJavaVM->AttachCurrentThread(&env, nullptr) != 0) {
+                LOGE(TAG, "Could not attach current thread");
+            }
+            attached = true;
+            break;
+        case JNI_EVERSION:
+            LOGE(TAG, "Invalid java version");
+            break;
+        default:
+            break;
+    }
+
+    if (env != nullptr) {
+        env->CallStaticVoidMethod(cls, methodId, arg1);
+    }
+
+    if (attached) {
+        sJavaVM->DetachCurrentThread();
+    }
+}
+
+jobject JniHelper::callObjectMethod(jobject obj, jmethodID methodId) {
+    if (sJavaVM == nullptr) {
+        LOGE(TAG, "sJavaVM is nullptr");
+        return nullptr;
+    }
+
+    JNIEnv *env = nullptr;
+    bool attached = false;
+    switch (sJavaVM->GetEnv((void **) &env, JNI_VERSION_1_4)) {
+        case JNI_OK:
+            break;
+        case JNI_EDETACHED:
+            if (sJavaVM->AttachCurrentThread(&env, nullptr) != 0) {
+                LOGE(TAG, "Could not attach current thread");
+            }
+            attached = true;
+            break;
+        case JNI_EVERSION:
+            LOGE(TAG, "Invalid java version");
+            break;
+        default:
+            break;
+    }
+
+    jobject ret = nullptr;
+    if (env != nullptr) {
+        ret = env->CallObjectMethod(obj, methodId);
+    }
+
+    if (attached) {
+        sJavaVM->DetachCurrentThread();
+    }
+
+    return ret;
+}
+
+jboolean JniHelper::callBooleanMethod(jobject obj, jmethodID methodId) {
+    if (sJavaVM == nullptr) {
+        LOGE(TAG, "sJavaVM is nullptr");
+        return false;
+    }
+
+    JNIEnv *env = nullptr;
+    bool attached = false;
+    switch (sJavaVM->GetEnv((void **) &env, JNI_VERSION_1_4)) {
+        case JNI_OK:
+            break;
+        case JNI_EDETACHED:
+            if (sJavaVM->AttachCurrentThread(&env, nullptr) != 0) {
+                LOGE(TAG, "Could not attach current thread");
+            }
+            attached = true;
+            break;
+        case JNI_EVERSION:
+            LOGE(TAG, "Invalid java version");
+            break;
+        default:
+            break;
+    }
+
+    jboolean ret;
+    if (env != nullptr) {
+        ret = env->CallBooleanMethod(obj, methodId);
+    }
+
+    if (attached) {
+        sJavaVM->DetachCurrentThread();
+    }
+
+    return ret;
+}
--- a/duix-sdk/src/main/cpp/android/JniHelper.h
+++ b/duix-sdk/src/main/cpp/android/JniHelper.h
@ -0,0 +1,50 @@
+#ifndef GPLAYER_JNIHELPER_H
+#define GPLAYER_JNIHELPER_H
+
+#include <jni.h>
+#include <string>
+
+using namespace std;
+
+class JniHelper {
+public:
+    static JNIEnv *getJNIEnv();
+
+    static bool attachCurrentThread();
+
+    static void detachCurrentThread();
+
+    static void throwException(JNIEnv *env, const char *className, const char *msg);
+
+    static jstring newStringUTF(JNIEnv *env, const char *data);
+
+    static string getStringUTF(JNIEnv *env, jstring obj);
+
+    static char *getCharArrayUTF(JNIEnv *env, jstring obj);
+
+    static jobject createByteBuffer(JNIEnv *env, unsigned char *buffer, int size);
+
+    static jobject createByteBuffer(JNIEnv *env, int size);
+
+    static void deleteLocalRef(jobject jobj);
+
+    static void callVoidMethod(jobject obj, jmethodID methodId);
+
+    static void callVoidMethod(jobject obj, jmethodID methodId, jint arg1, jint arg2, jint arg3, jint arg4);
+
+    static void callVoidMethod(jobject obj, jmethodID methodId, jint arg1, jint arg2,
+                               jint arg3, jstring arg4, jstring arg5, jobject arg6);
+
+    static int callIntMethod(jobject obj, jmethodID methodId, jobject arg1, jint arg2);
+
+    static void callStaticVoidMethod(jclass cls, jmethodID methodId, jint arg1);
+
+    static jobject callObjectMethod(jobject obj, jmethodID methodId);
+
+    static jboolean callBooleanMethod(jobject obj, jmethodID methodId);
+
+public:
+    static JavaVM *sJavaVM;
+};
+
+#endif //GPLAYER_JNIHELPER_H
--- a/duix-sdk/src/main/cpp/android/Log.cpp
+++ b/duix-sdk/src/main/cpp/android/Log.cpp
@ -0,0 +1,80 @@
+#if defined(_WIN32)
+#define _CRT_SECURE_NO_WARNINGS
+#endif
+
+#include "Log.h"
+
+#include <stdio.h>
+#include <time.h>
+#include <stdarg.h>
+
+#ifdef __ANDROID__
+#include <android/log.h>
+android_LogPriority s_android_logprio[LOG_TRACE + 1] = {
+        ANDROID_LOG_UNKNOWN,
+        ANDROID_LOG_FATAL,
+        ANDROID_LOG_ERROR,
+        ANDROID_LOG_WARN,
+        ANDROID_LOG_INFO,
+        ANDROID_LOG_DEBUG,
+        ANDROID_LOG_VERBOSE
+};
+
+#endif
+
+#if defined(_WIN32)
+#include <windows.h>
+#endif
+
+void __log_print(int lv, const char *tag, const char *funame, int line, const char *fmt, ...) {
+    char log_info[2040];
+    char *buf = log_info;
+    int ret, len = sizeof(log_info);
+
+//Android 不需要时间
+#ifndef __ANDROID__
+    /*
+    if (lv <= LogLevel::LOG_INFO) {    // 日志级别不小于INFO则打印时带时间标记
+        *buf++ = '[';
+        _get_curtime_str(buf);
+        //buf = buf + strlen(buf);
+        buf += 23;  // 时间格式为：XXXX - XX - XX XX : XX : XX.XXX  共占23个字节
+        *buf++ = ']';
+        *buf++ = ' ';
+
+        len -= buf - log_info;
+    }
+    */
+
+    if (lv <= LogLevel::LOG_WARN) {    // 日志级别不小于WARN则打印时带代码行信息
+        ret = sprintf(buf, "%s line:%-4d ", funame, line);
+        buf += ret;
+        len -= ret;
+    }
+#endif
+
+    va_list arglist;
+    va_start(arglist, fmt);
+
+    int itemLen = buf - log_info;
+#if defined( WIN32 )
+    ret = _vsnprintf(buf, len - 1, fmt, arglist);
+#else
+    ret = vsnprintf(buf, len - 1, fmt, arglist);
+#endif
+    if (ret < 0) {
+        buf[len - 1] = 0;
+        buf[len - 2] = '\n';
+        itemLen += len - 1;
+    } else
+        itemLen += ret;
+
+    va_end(arglist);
+
+#if defined(__ANDROID__)
+    __android_log_print(s_android_logprio[lv], tag, log_info, "");
+#else
+    //本地输出
+    //printf("Tag=%s %s\n", tag, log_info);
+#endif
+}
--- a/duix-sdk/src/main/cpp/android/Log.h
+++ b/duix-sdk/src/main/cpp/android/Log.h
@ -0,0 +1,44 @@
+#ifndef __GPLAYER_LOG_H__
+#define __GPLAYER_LOG_H__
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+//调试日志开关，1为开，其它为关
+#define LOG_OPEN 0
+
+#define __ANDROID__ 1
+enum LogLevel
+{
+    LOG_OFF    = 0,		        //!< 不打印日志
+    LOG_FATAL  = 1,	 	        //!< 严重
+    LOG_ERROR  = 2,				//!< 错误
+    LOG_WARN   = 3,				//!< 警告
+    LOG_INFO   = 4,				//!< 信息
+    LOG_DEBUG  = 5,				//!< 调试
+    LOG_TRACE  = 6,				//!< 跟踪
+};
+
+void __log_print(int lv, const char* tag, const char* funame, int line, const char *fmt, ...);
+
+#define LOGI(TAG, ...)  __log_print(LogLevel::LOG_INFO,  TAG, __FUNCTION__, __LINE__, __VA_ARGS__)
+#define LOGW(TAG, ...)  __log_print(LogLevel::LOG_WARN,  TAG, __FUNCTION__, __LINE__, __VA_ARGS__)
+#define LOGE(TAG, ...)  __log_print(LogLevel::LOG_ERROR, TAG, __FUNCTION__, __LINE__, __VA_ARGS__)
+#define LOGF(TAG, ...)  __log_print(LogLevel::LOG_FATAL, TAG, __FUNCTION__, __LINE__, __VA_ARGS__)
+
+#if defined(__ANDROID__)
+#if(LOG_OPEN == 1)
+#define LOGD(TAG,...)  __log_print(LogLevel::LOG_DEBUG, TAG, __FUNCTION__, __LINE__, __VA_ARGS__)
+#else
+#define LOGD(TAG, ...)  NULL
+#endif
+#else
+#define LOGD(TAG, ...)  __log_print(LogLevel::LOG_DEBUG, TAG, __FUNCTION__, __LINE__, __VA_ARGS__)
+#endif
+
+#ifdef __cplusplus
+};
+#endif
+
+#endif // !__GPLAYER_LOG_H__
--- a/duix-sdk/src/main/cpp/dhcore/atomicops.h
+++ b/duix-sdk/src/main/cpp/dhcore/atomicops.h
@ -0,0 +1,761 @@
+// ©2013-2016 Cameron Desrochers.
+// Distributed under the simplified BSD license (see the license file that
+// should have come with this header).
+// Uses Jeff Preshing's semaphore implementation (under the terms of its
+// separate zlib license, embedded below).
+
+#pragma once
+
+// Provides portable (VC++2010+, Intel ICC 13, GCC 4.7+, and anything C++11 compliant) implementation
+// of low-level memory barriers, plus a few semi-portable utility macros (for inlining and alignment).
+// Also has a basic atomic type (limited to hardware-supported atomics with no memory ordering guarantees).
+// Uses the AE_* prefix for macros (historical reasons), and the "moodycamel" namespace for symbols.
+
+#include <cerrno>
+#include <cassert>
+#include <type_traits>
+#include <cerrno>
+#include <cstdint>
+#include <ctime>
+
+// Platform detection
+#if defined(__INTEL_COMPILER)
+#define AE_ICC
+#elif defined(_MSC_VER)
+#define AE_VCPP
+#elif defined(__GNUC__)
+#define AE_GCC
+#endif
+
+#if defined(_M_IA64) || defined(__ia64__)
+#define AE_ARCH_IA64
+#elif defined(_WIN64) || defined(__amd64__) || defined(_M_X64) || defined(__x86_64__)
+#define AE_ARCH_X64
+#elif defined(_M_IX86) || defined(__i386__)
+#define AE_ARCH_X86
+#elif defined(_M_PPC) || defined(__powerpc__)
+#define AE_ARCH_PPC
+#else
+#define AE_ARCH_UNKNOWN
+#endif
+
+
+// AE_UNUSED
+#define AE_UNUSED(x) ((void)x)
+
+// AE_NO_TSAN/AE_TSAN_ANNOTATE_*
+#if defined(__has_feature)
+#if __has_feature(thread_sanitizer)
+#if __cplusplus >= 201703L  // inline variables require C++17
+namespace moodycamel { inline int ae_tsan_global; }
+#define AE_TSAN_ANNOTATE_RELEASE() AnnotateHappensBefore(__FILE__, __LINE__, (void *)(&::moodycamel::ae_tsan_global))
+#define AE_TSAN_ANNOTATE_ACQUIRE() AnnotateHappensAfter(__FILE__, __LINE__, (void *)(&::moodycamel::ae_tsan_global))
+extern "C" void AnnotateHappensBefore(const char*, int, void*);
+extern "C" void AnnotateHappensAfter(const char*, int, void*);
+#else  // when we can't work with tsan, attempt to disable its warnings
+#define AE_NO_TSAN __attribute__((no_sanitize("thread")))
+#endif
+#endif
+#endif
+#ifndef AE_NO_TSAN
+#define AE_NO_TSAN
+#endif
+#ifndef AE_TSAN_ANNOTATE_RELEASE
+#define AE_TSAN_ANNOTATE_RELEASE()
+#define AE_TSAN_ANNOTATE_ACQUIRE()
+#endif
+
+
+// AE_FORCEINLINE
+#if defined(AE_VCPP) || defined(AE_ICC)
+#define AE_FORCEINLINE __forceinline
+#elif defined(AE_GCC)
+//#define AE_FORCEINLINE __attribute__((always_inline)) 
+#define AE_FORCEINLINE inline
+#else
+#define AE_FORCEINLINE inline
+#endif
+
+
+// AE_ALIGN
+#if defined(AE_VCPP) || defined(AE_ICC)
+#define AE_ALIGN(x) __declspec(align(x))
+#elif defined(AE_GCC)
+#define AE_ALIGN(x) __attribute__((aligned(x)))
+#else
+// Assume GCC compliant syntax...
+#define AE_ALIGN(x) __attribute__((aligned(x)))
+#endif
+
+
+// Portable atomic fences implemented below:
+
+namespace moodycamel {
+
+enum memory_order {
+	memory_order_relaxed,
+	memory_order_acquire,
+	memory_order_release,
+	memory_order_acq_rel,
+	memory_order_seq_cst,
+
+	// memory_order_sync: Forces a full sync:
+	// #LoadLoad, #LoadStore, #StoreStore, and most significantly, #StoreLoad
+	memory_order_sync = memory_order_seq_cst
+};
+
+}    // end namespace moodycamel
+
+#if (defined(AE_VCPP) && (_MSC_VER < 1700 || defined(__cplusplus_cli))) || (defined(AE_ICC) && __INTEL_COMPILER < 1600)
+// VS2010 and ICC13 don't support std::atomic_*_fence, implement our own fences
+
+#include <intrin.h>
+
+#if defined(AE_ARCH_X64) || defined(AE_ARCH_X86)
+#define AeFullSync _mm_mfence
+#define AeLiteSync _mm_mfence
+#elif defined(AE_ARCH_IA64)
+#define AeFullSync __mf
+#define AeLiteSync __mf
+#elif defined(AE_ARCH_PPC)
+#include <ppcintrinsics.h>
+#define AeFullSync __sync
+#define AeLiteSync __lwsync
+#endif
+
+
+#ifdef AE_VCPP
+#pragma warning(push)
+#pragma warning(disable: 4365)		// Disable erroneous 'conversion from long to unsigned int, signed/unsigned mismatch' error when using `assert`
+#ifdef __cplusplus_cli
+#pragma managed(push, off)
+#endif
+#endif
+
+namespace moodycamel {
+
+AE_FORCEINLINE void compiler_fence(memory_order order) AE_NO_TSAN
+{
+	switch (order) {
+		case memory_order_relaxed: break;
+		case memory_order_acquire: _ReadBarrier(); break;
+		case memory_order_release: _WriteBarrier(); break;
+		case memory_order_acq_rel: _ReadWriteBarrier(); break;
+		case memory_order_seq_cst: _ReadWriteBarrier(); break;
+		default: assert(false);
+	}
+}
+
+// x86/x64 have a strong memory model -- all loads and stores have
+// acquire and release semantics automatically (so only need compiler
+// barriers for those).
+#if defined(AE_ARCH_X86) || defined(AE_ARCH_X64)
+AE_FORCEINLINE void fence(memory_order order) AE_NO_TSAN
+{
+	switch (order) {
+		case memory_order_relaxed: break;
+		case memory_order_acquire: _ReadBarrier(); break;
+		case memory_order_release: _WriteBarrier(); break;
+		case memory_order_acq_rel: _ReadWriteBarrier(); break;
+		case memory_order_seq_cst:
+			_ReadWriteBarrier();
+			AeFullSync();
+			_ReadWriteBarrier();
+			break;
+		default: assert(false);
+	}
+}
+#else
+AE_FORCEINLINE void fence(memory_order order) AE_NO_TSAN
+{
+	// Non-specialized arch, use heavier memory barriers everywhere just in case :-(
+	switch (order) {
+		case memory_order_relaxed:
+			break;
+		case memory_order_acquire:
+			_ReadBarrier();
+			AeLiteSync();
+			_ReadBarrier();
+			break;
+		case memory_order_release:
+			_WriteBarrier();
+			AeLiteSync();
+			_WriteBarrier();
+			break;
+		case memory_order_acq_rel:
+			_ReadWriteBarrier();
+			AeLiteSync();
+			_ReadWriteBarrier();
+			break;
+		case memory_order_seq_cst:
+			_ReadWriteBarrier();
+			AeFullSync();
+			_ReadWriteBarrier();
+			break;
+		default: assert(false);
+	}
+}
+#endif
+}    // end namespace moodycamel
+#else
+// Use standard library of atomics
+#include <atomic>
+
+namespace moodycamel {
+
+AE_FORCEINLINE void compiler_fence(memory_order order) AE_NO_TSAN
+{
+	switch (order) {
+		case memory_order_relaxed: break;
+		case memory_order_acquire: std::atomic_signal_fence(std::memory_order_acquire); break;
+		case memory_order_release: std::atomic_signal_fence(std::memory_order_release); break;
+		case memory_order_acq_rel: std::atomic_signal_fence(std::memory_order_acq_rel); break;
+		case memory_order_seq_cst: std::atomic_signal_fence(std::memory_order_seq_cst); break;
+		default: assert(false);
+	}
+}
+
+AE_FORCEINLINE void fence(memory_order order) AE_NO_TSAN
+{
+	switch (order) {
+		case memory_order_relaxed: break;
+		case memory_order_acquire: AE_TSAN_ANNOTATE_ACQUIRE(); std::atomic_thread_fence(std::memory_order_acquire); break;
+		case memory_order_release: AE_TSAN_ANNOTATE_RELEASE(); std::atomic_thread_fence(std::memory_order_release); break;
+		case memory_order_acq_rel: AE_TSAN_ANNOTATE_ACQUIRE(); AE_TSAN_ANNOTATE_RELEASE(); std::atomic_thread_fence(std::memory_order_acq_rel); break;
+		case memory_order_seq_cst: AE_TSAN_ANNOTATE_ACQUIRE(); AE_TSAN_ANNOTATE_RELEASE(); std::atomic_thread_fence(std::memory_order_seq_cst); break;
+		default: assert(false);
+	}
+}
+
+}    // end namespace moodycamel
+
+#endif
+
+
+#if !defined(AE_VCPP) || (_MSC_VER >= 1700 && !defined(__cplusplus_cli))
+#define AE_USE_STD_ATOMIC_FOR_WEAK_ATOMIC
+#endif
+
+#ifdef AE_USE_STD_ATOMIC_FOR_WEAK_ATOMIC
+#include <atomic>
+#endif
+#include <utility>
+
+// WARNING: *NOT* A REPLACEMENT FOR std::atomic. READ CAREFULLY:
+// Provides basic support for atomic variables -- no memory ordering guarantees are provided.
+// The guarantee of atomicity is only made for types that already have atomic load and store guarantees
+// at the hardware level -- on most platforms this generally means aligned pointers and integers (only).
+namespace moodycamel {
+template<typename T>
+class weak_atomic
+{
+public:
+	AE_NO_TSAN weak_atomic() : value() { }
+#ifdef AE_VCPP
+#pragma warning(push)
+#pragma warning(disable: 4100)		// Get rid of (erroneous) 'unreferenced formal parameter' warning
+#endif
+	template<typename U> AE_NO_TSAN weak_atomic(U&& x) : value(std::forward<U>(x)) {  }
+#ifdef __cplusplus_cli
+	// Work around bug with universal reference/nullptr combination that only appears when /clr is on
+	AE_NO_TSAN weak_atomic(nullptr_t) : value(nullptr) {  }
+#endif
+	AE_NO_TSAN weak_atomic(weak_atomic const& other) : value(other.load()) {  }
+	AE_NO_TSAN weak_atomic(weak_atomic&& other) : value(std::move(other.load())) {  }
+#ifdef AE_VCPP
+#pragma warning(pop)
+#endif
+
+	AE_FORCEINLINE operator T() const AE_NO_TSAN { return load(); }
+
+	
+#ifndef AE_USE_STD_ATOMIC_FOR_WEAK_ATOMIC
+	template<typename U> AE_FORCEINLINE weak_atomic const& operator=(U&& x) AE_NO_TSAN { value = std::forward<U>(x); return *this; }
+	AE_FORCEINLINE weak_atomic const& operator=(weak_atomic const& other) AE_NO_TSAN { value = other.value; return *this; }
+	
+	AE_FORCEINLINE T load() const AE_NO_TSAN { return value; }
+	
+	AE_FORCEINLINE T fetch_add_acquire(T increment) AE_NO_TSAN
+	{
+#if defined(AE_ARCH_X64) || defined(AE_ARCH_X86)
+		if (sizeof(T) == 4) return _InterlockedExchangeAdd((long volatile*)&value, (long)increment);
+#if defined(_M_AMD64)
+		else if (sizeof(T) == 8) return _InterlockedExchangeAdd64((long long volatile*)&value, (long long)increment);
+#endif
+#else
+#error Unsupported platform
+#endif
+		assert(false && "T must be either a 32 or 64 bit type");
+		return value;
+	}
+	
+	AE_FORCEINLINE T fetch_add_release(T increment) AE_NO_TSAN
+	{
+#if defined(AE_ARCH_X64) || defined(AE_ARCH_X86)
+		if (sizeof(T) == 4) return _InterlockedExchangeAdd((long volatile*)&value, (long)increment);
+#if defined(_M_AMD64)
+		else if (sizeof(T) == 8) return _InterlockedExchangeAdd64((long long volatile*)&value, (long long)increment);
+#endif
+#else
+#error Unsupported platform
+#endif
+		assert(false && "T must be either a 32 or 64 bit type");
+		return value;
+	}
+#else
+	template<typename U>
+	AE_FORCEINLINE weak_atomic const& operator=(U&& x) AE_NO_TSAN
+	{
+		value.store(std::forward<U>(x), std::memory_order_relaxed);
+		return *this;
+	}
+	
+	AE_FORCEINLINE weak_atomic const& operator=(weak_atomic const& other) AE_NO_TSAN
+	{
+		value.store(other.value.load(std::memory_order_relaxed), std::memory_order_relaxed);
+		return *this;
+	}
+
+	AE_FORCEINLINE T load() const AE_NO_TSAN { return value.load(std::memory_order_relaxed); }
+	
+	AE_FORCEINLINE T fetch_add_acquire(T increment) AE_NO_TSAN
+	{
+		return value.fetch_add(increment, std::memory_order_acquire);
+	}
+	
+	AE_FORCEINLINE T fetch_add_release(T increment) AE_NO_TSAN
+	{
+		return value.fetch_add(increment, std::memory_order_release);
+	}
+#endif
+	
+
+private:
+#ifndef AE_USE_STD_ATOMIC_FOR_WEAK_ATOMIC
+	// No std::atomic support, but still need to circumvent compiler optimizations.
+	// `volatile` will make memory access slow, but is guaranteed to be reliable.
+	volatile T value;
+#else
+	std::atomic<T> value;
+#endif
+};
+
+}	// end namespace moodycamel
+
+
+
+// Portable single-producer, single-consumer semaphore below:
+
+#if defined(_WIN32)
+// Avoid including windows.h in a header; we only need a handful of
+// items, so we'll redeclare them here (this is relatively safe since
+// the API generally has to remain stable between Windows versions).
+// I know this is an ugly hack but it still beats polluting the global
+// namespace with thousands of generic names or adding a .cpp for nothing.
+extern "C" {
+	struct _SECURITY_ATTRIBUTES;
+	__declspec(dllimport) void* __stdcall CreateSemaphoreW(_SECURITY_ATTRIBUTES* lpSemaphoreAttributes, long lInitialCount, long lMaximumCount, const wchar_t* lpName);
+	__declspec(dllimport) int __stdcall CloseHandle(void* hObject);
+	__declspec(dllimport) unsigned long __stdcall WaitForSingleObject(void* hHandle, unsigned long dwMilliseconds);
+	__declspec(dllimport) int __stdcall ReleaseSemaphore(void* hSemaphore, long lReleaseCount, long* lpPreviousCount);
+}
+#elif defined(__MACH__)
+#include <mach/mach.h>
+#elif defined(__unix__)
+#include <semaphore.h>
+#elif defined(FREERTOS)
+#include <FreeRTOS.h>
+#include <semphr.h>
+#include <task.h>
+#endif
+
+namespace moodycamel
+{
+	// Code in the spsc_sema namespace below is an adaptation of Jeff Preshing's
+	// portable + lightweight semaphore implementations, originally from
+	// https://github.com/preshing/cpp11-on-multicore/blob/master/common/sema.h
+	// LICENSE:
+	// Copyright (c) 2015 Jeff Preshing
+	//
+	// This software is provided 'as-is', without any express or implied
+	// warranty. In no event will the authors be held liable for any damages
+	// arising from the use of this software.
+	//
+	// Permission is granted to anyone to use this software for any purpose,
+	// including commercial applications, and to alter it and redistribute it
+	// freely, subject to the following restrictions:
+	//
+	// 1. The origin of this software must not be misrepresented; you must not
+	//    claim that you wrote the original software. If you use this software
+	//    in a product, an acknowledgement in the product documentation would be
+	//    appreciated but is not required.
+	// 2. Altered source versions must be plainly marked as such, and must not be
+	//    misrepresented as being the original software.
+	// 3. This notice may not be removed or altered from any source distribution.
+	namespace spsc_sema
+	{
+#if defined(_WIN32)
+		class Semaphore
+		{
+		private:
+		    void* m_hSema;
+		    
+		    Semaphore(const Semaphore& other);
+		    Semaphore& operator=(const Semaphore& other);
+
+		public:
+		    AE_NO_TSAN Semaphore(int initialCount = 0) : m_hSema()
+		    {
+		        assert(initialCount >= 0);
+		        const long maxLong = 0x7fffffff;
+		        m_hSema = CreateSemaphoreW(nullptr, initialCount, maxLong, nullptr);
+		        assert(m_hSema);
+		    }
+
+		    AE_NO_TSAN ~Semaphore()
+		    {
+		        CloseHandle(m_hSema);
+		    }
+
+		    bool wait() AE_NO_TSAN
+		    {
+		    	const unsigned long infinite = 0xffffffff;
+		        return WaitForSingleObject(m_hSema, infinite) == 0;
+		    }
+
+			bool try_wait() AE_NO_TSAN
+			{
+				return WaitForSingleObject(m_hSema, 0) == 0;
+			}
+
+			bool timed_wait(std::uint64_t usecs) AE_NO_TSAN
+			{
+				return WaitForSingleObject(m_hSema, (unsigned long)(usecs / 1000)) == 0;
+			}
+
+		    void signal(int count = 1) AE_NO_TSAN
+		    {
+		        while (!ReleaseSemaphore(m_hSema, count, nullptr));
+		    }
+		};
+#elif defined(__MACH__)
+		//---------------------------------------------------------
+		// Semaphore (Apple iOS and OSX)
+		// Can't use POSIX semaphores due to http://lists.apple.com/archives/darwin-kernel/2009/Apr/msg00010.html
+		//---------------------------------------------------------
+		class Semaphore
+		{
+		private:
+		    semaphore_t m_sema;
+
+		    Semaphore(const Semaphore& other);
+		    Semaphore& operator=(const Semaphore& other);
+
+		public:
+		    AE_NO_TSAN Semaphore(int initialCount = 0) : m_sema()
+		    {
+		        assert(initialCount >= 0);
+		        kern_return_t rc = semaphore_create(mach_task_self(), &m_sema, SYNC_POLICY_FIFO, initialCount);
+		        assert(rc == KERN_SUCCESS);
+		        AE_UNUSED(rc);
+		    }
+
+		    AE_NO_TSAN ~Semaphore()
+		    {
+		        semaphore_destroy(mach_task_self(), m_sema);
+		    }
+
+		    bool wait() AE_NO_TSAN
+		    {
+		        return semaphore_wait(m_sema) == KERN_SUCCESS;
+		    }
+
+			bool try_wait() AE_NO_TSAN
+			{
+				return timed_wait(0);
+			}
+
+			bool timed_wait(std::uint64_t timeout_usecs) AE_NO_TSAN
+			{
+				mach_timespec_t ts;
+				ts.tv_sec = static_cast<unsigned int>(timeout_usecs / 1000000);
+				ts.tv_nsec = static_cast<int>((timeout_usecs % 1000000) * 1000);
+
+				// added in OSX 10.10: https://developer.apple.com/library/prerelease/mac/documentation/General/Reference/APIDiffsMacOSX10_10SeedDiff/modules/Darwin.html
+				kern_return_t rc = semaphore_timedwait(m_sema, ts);
+				return rc == KERN_SUCCESS;
+			}
+
+		    void signal() AE_NO_TSAN
+		    {
+		        while (semaphore_signal(m_sema) != KERN_SUCCESS);
+		    }
+
+		    void signal(int count) AE_NO_TSAN
+		    {
+		        while (count-- > 0)
+		        {
+		            while (semaphore_signal(m_sema) != KERN_SUCCESS);
+		        }
+		    }
+		};
+#elif defined(__unix__)
+		//---------------------------------------------------------
+		// Semaphore (POSIX, Linux)
+		//---------------------------------------------------------
+		class Semaphore
+		{
+		private:
+		    sem_t m_sema;
+
+		    Semaphore(const Semaphore& other);
+		    Semaphore& operator=(const Semaphore& other);
+
+		public:
+		    AE_NO_TSAN Semaphore(int initialCount = 0) : m_sema()
+		    {
+		        assert(initialCount >= 0);
+		        int rc = sem_init(&m_sema, 0, static_cast<unsigned int>(initialCount));
+		        assert(rc == 0);
+		        AE_UNUSED(rc);
+		    }
+
+		    AE_NO_TSAN ~Semaphore()
+		    {
+		        sem_destroy(&m_sema);
+		    }
+
+		    bool wait() AE_NO_TSAN
+		    {
+		        // http://stackoverflow.com/questions/2013181/gdb-causes-sem-wait-to-fail-with-eintr-error
+		        int rc;
+		        do
+		        {
+		            rc = sem_wait(&m_sema);
+		        }
+		        while (rc == -1 && errno == EINTR);
+		        return rc == 0;
+		    }
+
+			bool try_wait() AE_NO_TSAN
+			{
+				int rc;
+				do {
+					rc = sem_trywait(&m_sema);
+				} while (rc == -1 && errno == EINTR);
+				return rc == 0;
+			}
+
+			bool timed_wait(std::uint64_t usecs) AE_NO_TSAN
+			{
+				struct timespec ts;
+				const int usecs_in_1_sec = 1000000;
+				const int nsecs_in_1_sec = 1000000000;
+				clock_gettime(CLOCK_REALTIME, &ts);
+				ts.tv_sec += static_cast<time_t>(usecs / usecs_in_1_sec);
+				ts.tv_nsec += static_cast<long>(usecs % usecs_in_1_sec) * 1000;
+				// sem_timedwait bombs if you have more than 1e9 in tv_nsec
+				// so we have to clean things up before passing it in
+				if (ts.tv_nsec >= nsecs_in_1_sec) {
+					ts.tv_nsec -= nsecs_in_1_sec;
+					++ts.tv_sec;
+				}
+
+				int rc;
+				do {
+					rc = sem_timedwait(&m_sema, &ts);
+				} while (rc == -1 && errno == EINTR);
+				return rc == 0;
+			}
+
+		    void signal() AE_NO_TSAN
+		    {
+		        while (sem_post(&m_sema) == -1);
+		    }
+
+		    void signal(int count) AE_NO_TSAN
+		    {
+		        while (count-- > 0)
+		        {
+		            while (sem_post(&m_sema) == -1);
+		        }
+		    }
+		};
+#elif defined(FREERTOS)
+		//---------------------------------------------------------
+		// Semaphore (FreeRTOS)
+		//---------------------------------------------------------
+		class Semaphore
+		{
+		private:
+			SemaphoreHandle_t m_sema;
+
+			Semaphore(const Semaphore& other);
+			Semaphore& operator=(const Semaphore& other);
+
+		public:
+			AE_NO_TSAN Semaphore(int initialCount = 0) : m_sema()
+			{
+				assert(initialCount >= 0);
+				m_sema = xSemaphoreCreateCounting(static_cast<UBaseType_t>(~0ull), static_cast<UBaseType_t>(initialCount));
+				assert(m_sema);
+			}
+
+			AE_NO_TSAN ~Semaphore()
+			{
+				vSemaphoreDelete(m_sema);
+			}
+
+			bool wait() AE_NO_TSAN
+			{
+				return xSemaphoreTake(m_sema, portMAX_DELAY) == pdTRUE;
+			}
+
+			bool try_wait() AE_NO_TSAN
+			{
+				// Note: In an ISR context, if this causes a task to unblock,
+				// the caller won't know about it
+				if (xPortIsInsideInterrupt())
+					return xSemaphoreTakeFromISR(m_sema, NULL) == pdTRUE;
+				return xSemaphoreTake(m_sema, 0) == pdTRUE;
+			}
+
+			bool timed_wait(std::uint64_t usecs) AE_NO_TSAN
+			{
+				std::uint64_t msecs = usecs / 1000;
+				TickType_t ticks = static_cast<TickType_t>(msecs / portTICK_PERIOD_MS);
+				if (ticks == 0)
+					return try_wait();
+				return xSemaphoreTake(m_sema, ticks) == pdTRUE;
+			}
+
+			void signal() AE_NO_TSAN
+			{
+				// Note: In an ISR context, if this causes a task to unblock,
+				// the caller won't know about it
+				BaseType_t rc;
+				if (xPortIsInsideInterrupt())
+					rc = xSemaphoreGiveFromISR(m_sema, NULL);
+				else
+					rc = xSemaphoreGive(m_sema);
+				assert(rc == pdTRUE);
+				AE_UNUSED(rc);
+			}
+
+			void signal(int count) AE_NO_TSAN
+			{
+				while (count-- > 0)
+					signal();
+			}
+		};
+#else
+#error Unsupported platform! (No semaphore wrapper available)
+#endif
+
+		//---------------------------------------------------------
+		// LightweightSemaphore
+		//---------------------------------------------------------
+		class LightweightSemaphore
+		{
+		public:
+			typedef std::make_signed<std::size_t>::type ssize_t;
+			
+		private:
+		    weak_atomic<ssize_t> m_count;
+		    Semaphore m_sema;
+
+		    bool waitWithPartialSpinning(std::int64_t timeout_usecs = -1) AE_NO_TSAN
+		    {
+		        ssize_t oldCount;
+		        // Is there a better way to set the initial spin count?
+		        // If we lower it to 1000, testBenaphore becomes 15x slower on my Core i7-5930K Windows PC,
+		        // as threads start hitting the kernel semaphore.
+		        int spin = 1024;
+		        while (--spin >= 0)
+		        {
+		            if (m_count.load() > 0)
+		            {
+		                m_count.fetch_add_acquire(-1);
+		                return true;
+		            }
+		            compiler_fence(memory_order_acquire);     // Prevent the compiler from collapsing the loop.
+		        }
+		        oldCount = m_count.fetch_add_acquire(-1);
+				if (oldCount > 0)
+					return true;
+		        if (timeout_usecs < 0)
+				{
+					if (m_sema.wait())
+						return true;
+				}
+				if (timeout_usecs > 0 && m_sema.timed_wait(static_cast<uint64_t>(timeout_usecs)))
+					return true;
+				// At this point, we've timed out waiting for the semaphore, but the
+				// count is still decremented indicating we may still be waiting on
+				// it. So we have to re-adjust the count, but only if the semaphore
+				// wasn't signaled enough times for us too since then. If it was, we
+				// need to release the semaphore too.
+				while (true)
+				{
+					oldCount = m_count.fetch_add_release(1);
+					if (oldCount < 0)
+						return false;    // successfully restored things to the way they were
+					// Oh, the producer thread just signaled the semaphore after all. Try again:
+					oldCount = m_count.fetch_add_acquire(-1);
+					if (oldCount > 0 && m_sema.try_wait())
+						return true;
+				}
+		    }
+
+		public:
+		    AE_NO_TSAN LightweightSemaphore(ssize_t initialCount = 0) : m_count(initialCount), m_sema()
+		    {
+		        assert(initialCount >= 0);
+		    }
+
+		    bool tryWait() AE_NO_TSAN
+		    {
+		        if (m_count.load() > 0)
+		        {
+		        	m_count.fetch_add_acquire(-1);
+		        	return true;
+		        }
+		        return false;
+		    }
+
+		    bool wait() AE_NO_TSAN
+		    {
+		        return tryWait() || waitWithPartialSpinning();
+		    }
+
+			bool wait(std::int64_t timeout_usecs) AE_NO_TSAN
+			{
+				return tryWait() || waitWithPartialSpinning(timeout_usecs);
+			}
+
+		    void signal(ssize_t count = 1) AE_NO_TSAN
+		    {
+		    	assert(count >= 0);
+		        ssize_t oldCount = m_count.fetch_add_release(count);
+		        assert(oldCount >= -1);
+		        if (oldCount < 0)
+		        {
+		            m_sema.signal(1);
+		        }
+		    }
+		    
+		    std::size_t availableApprox() const AE_NO_TSAN
+		    {
+		    	ssize_t count = m_count.load();
+		    	return count > 0 ? static_cast<std::size_t>(count) : 0;
+		    }
+		};
+	}	// end namespace spsc_sema
+}	// end namespace moodycamel
+
+#if defined(AE_VCPP) && (_MSC_VER < 1700 || defined(__cplusplus_cli))
+#pragma warning(pop)
+#ifdef __cplusplus_cli
+#pragma managed(pop)
+#endif
+#endif
--- a/duix-sdk/src/main/cpp/dhcore/blockingconcurrentqueue.h
+++ b/duix-sdk/src/main/cpp/dhcore/blockingconcurrentqueue.h
@ -0,0 +1,582 @@
+// Provides an efficient blocking version of moodycamel::ConcurrentQueue.
+// ©2015-2020 Cameron Desrochers. Distributed under the terms of the simplified
+// BSD license, available at the top of concurrentqueue.h.
+// Also dual-licensed under the Boost Software License (see LICENSE.md)
+// Uses Jeff Preshing's semaphore implementation (under the terms of its
+// separate zlib license, see lightweightsemaphore.h).
+
+#pragma once
+
+#include "concurrentqueue.h"
+#include "lightweightsemaphore.h"
+
+#include <type_traits>
+#include <cerrno>
+#include <memory>
+#include <chrono>
+#include <ctime>
+
+namespace moodycamel
+{
+// This is a blocking version of the queue. It has an almost identical interface to
+// the normal non-blocking version, with the addition of various wait_dequeue() methods
+// and the removal of producer-specific dequeue methods.
+template<typename T, typename Traits = ConcurrentQueueDefaultTraits>
+class BlockingConcurrentQueue
+{
+private:
+	typedef ::moodycamel::ConcurrentQueue<T, Traits> ConcurrentQueue;
+	typedef ::moodycamel::LightweightSemaphore LightweightSemaphore;
+
+public:
+	typedef typename ConcurrentQueue::producer_token_t producer_token_t;
+	typedef typename ConcurrentQueue::consumer_token_t consumer_token_t;
+	
+	typedef typename ConcurrentQueue::index_t index_t;
+	typedef typename ConcurrentQueue::size_t size_t;
+	typedef typename std::make_signed<size_t>::type ssize_t;
+	
+	static const size_t BLOCK_SIZE = ConcurrentQueue::BLOCK_SIZE;
+	static const size_t EXPLICIT_BLOCK_EMPTY_COUNTER_THRESHOLD = ConcurrentQueue::EXPLICIT_BLOCK_EMPTY_COUNTER_THRESHOLD;
+	static const size_t EXPLICIT_INITIAL_INDEX_SIZE = ConcurrentQueue::EXPLICIT_INITIAL_INDEX_SIZE;
+	static const size_t IMPLICIT_INITIAL_INDEX_SIZE = ConcurrentQueue::IMPLICIT_INITIAL_INDEX_SIZE;
+	static const size_t INITIAL_IMPLICIT_PRODUCER_HASH_SIZE = ConcurrentQueue::INITIAL_IMPLICIT_PRODUCER_HASH_SIZE;
+	static const std::uint32_t EXPLICIT_CONSUMER_CONSUMPTION_QUOTA_BEFORE_ROTATE = ConcurrentQueue::EXPLICIT_CONSUMER_CONSUMPTION_QUOTA_BEFORE_ROTATE;
+	static const size_t MAX_SUBQUEUE_SIZE = ConcurrentQueue::MAX_SUBQUEUE_SIZE;
+	
+public:
+	// Creates a queue with at least `capacity` element slots; note that the
+	// actual number of elements that can be inserted without additional memory
+	// allocation depends on the number of producers and the block size (e.g. if
+	// the block size is equal to `capacity`, only a single block will be allocated
+	// up-front, which means only a single producer will be able to enqueue elements
+	// without an extra allocation -- blocks aren't shared between producers).
+	// This method is not thread safe -- it is up to the user to ensure that the
+	// queue is fully constructed before it starts being used by other threads (this
+	// includes making the memory effects of construction visible, possibly with a
+	// memory barrier).
+	explicit BlockingConcurrentQueue(size_t capacity = 6 * BLOCK_SIZE)
+		: inner(capacity), sema(create<LightweightSemaphore, ssize_t, int>(0, (int)Traits::MAX_SEMA_SPINS), &BlockingConcurrentQueue::template destroy<LightweightSemaphore>)
+	{
+		assert(reinterpret_cast<ConcurrentQueue*>((BlockingConcurrentQueue*)1) == &((BlockingConcurrentQueue*)1)->inner && "BlockingConcurrentQueue must have ConcurrentQueue as its first member");
+		if (!sema) {
+			MOODYCAMEL_THROW(std::bad_alloc());
+		}
+	}
+	
+	BlockingConcurrentQueue(size_t minCapacity, size_t maxExplicitProducers, size_t maxImplicitProducers)
+		: inner(minCapacity, maxExplicitProducers, maxImplicitProducers), sema(create<LightweightSemaphore, ssize_t, int>(0, (int)Traits::MAX_SEMA_SPINS), &BlockingConcurrentQueue::template destroy<LightweightSemaphore>)
+	{
+		assert(reinterpret_cast<ConcurrentQueue*>((BlockingConcurrentQueue*)1) == &((BlockingConcurrentQueue*)1)->inner && "BlockingConcurrentQueue must have ConcurrentQueue as its first member");
+		if (!sema) {
+			MOODYCAMEL_THROW(std::bad_alloc());
+		}
+	}
+	
+	// Disable copying and copy assignment
+	BlockingConcurrentQueue(BlockingConcurrentQueue const&) MOODYCAMEL_DELETE_FUNCTION;
+	BlockingConcurrentQueue& operator=(BlockingConcurrentQueue const&) MOODYCAMEL_DELETE_FUNCTION;
+	
+	// Moving is supported, but note that it is *not* a thread-safe operation.
+	// Nobody can use the queue while it's being moved, and the memory effects
+	// of that move must be propagated to other threads before they can use it.
+	// Note: When a queue is moved, its tokens are still valid but can only be
+	// used with the destination queue (i.e. semantically they are moved along
+	// with the queue itself).
+	BlockingConcurrentQueue(BlockingConcurrentQueue&& other) MOODYCAMEL_NOEXCEPT
+		: inner(std::move(other.inner)), sema(std::move(other.sema))
+	{ }
+	
+	inline BlockingConcurrentQueue& operator=(BlockingConcurrentQueue&& other) MOODYCAMEL_NOEXCEPT
+	{
+		return swap_internal(other);
+	}
+	
+	// Swaps this queue's state with the other's. Not thread-safe.
+	// Swapping two queues does not invalidate their tokens, however
+	// the tokens that were created for one queue must be used with
+	// only the swapped queue (i.e. the tokens are tied to the
+	// queue's movable state, not the object itself).
+	inline void swap(BlockingConcurrentQueue& other) MOODYCAMEL_NOEXCEPT
+	{
+		swap_internal(other);
+	}
+	
+private:
+	BlockingConcurrentQueue& swap_internal(BlockingConcurrentQueue& other)
+	{
+		if (this == &other) {
+			return *this;
+		}
+		
+		inner.swap(other.inner);
+		sema.swap(other.sema);
+		return *this;
+	}
+	
+public:
+	// Enqueues a single item (by copying it).
+	// Allocates memory if required. Only fails if memory allocation fails (or implicit
+	// production is disabled because Traits::INITIAL_IMPLICIT_PRODUCER_HASH_SIZE is 0,
+	// or Traits::MAX_SUBQUEUE_SIZE has been defined and would be surpassed).
+	// Thread-safe.
+	inline bool enqueue(T const& item)
+	{
+		if ((details::likely)(inner.enqueue(item))) {
+			sema->signal();
+			return true;
+		}
+		return false;
+	}
+	
+	// Enqueues a single item (by moving it, if possible).
+	// Allocates memory if required. Only fails if memory allocation fails (or implicit
+	// production is disabled because Traits::INITIAL_IMPLICIT_PRODUCER_HASH_SIZE is 0,
+	// or Traits::MAX_SUBQUEUE_SIZE has been defined and would be surpassed).
+	// Thread-safe.
+	inline bool enqueue(T&& item)
+	{
+		if ((details::likely)(inner.enqueue(std::move(item)))) {
+			sema->signal();
+			return true;
+		}
+		return false;
+	}
+	
+	// Enqueues a single item (by copying it) using an explicit producer token.
+	// Allocates memory if required. Only fails if memory allocation fails (or
+	// Traits::MAX_SUBQUEUE_SIZE has been defined and would be surpassed).
+	// Thread-safe.
+	inline bool enqueue(producer_token_t const& token, T const& item)
+	{
+		if ((details::likely)(inner.enqueue(token, item))) {
+			sema->signal();
+			return true;
+		}
+		return false;
+	}
+	
+	// Enqueues a single item (by moving it, if possible) using an explicit producer token.
+	// Allocates memory if required. Only fails if memory allocation fails (or
+	// Traits::MAX_SUBQUEUE_SIZE has been defined and would be surpassed).
+	// Thread-safe.
+	inline bool enqueue(producer_token_t const& token, T&& item)
+	{
+		if ((details::likely)(inner.enqueue(token, std::move(item)))) {
+			sema->signal();
+			return true;
+		}
+		return false;
+	}
+	
+	// Enqueues several items.
+	// Allocates memory if required. Only fails if memory allocation fails (or
+	// implicit production is disabled because Traits::INITIAL_IMPLICIT_PRODUCER_HASH_SIZE
+	// is 0, or Traits::MAX_SUBQUEUE_SIZE has been defined and would be surpassed).
+	// Note: Use std::make_move_iterator if the elements should be moved instead of copied.
+	// Thread-safe.
+	template<typename It>
+	inline bool enqueue_bulk(It itemFirst, size_t count)
+	{
+		if ((details::likely)(inner.enqueue_bulk(std::forward<It>(itemFirst), count))) {
+			sema->signal((LightweightSemaphore::ssize_t)(ssize_t)count);
+			return true;
+		}
+		return false;
+	}
+	
+	// Enqueues several items using an explicit producer token.
+	// Allocates memory if required. Only fails if memory allocation fails
+	// (or Traits::MAX_SUBQUEUE_SIZE has been defined and would be surpassed).
+	// Note: Use std::make_move_iterator if the elements should be moved
+	// instead of copied.
+	// Thread-safe.
+	template<typename It>
+	inline bool enqueue_bulk(producer_token_t const& token, It itemFirst, size_t count)
+	{
+		if ((details::likely)(inner.enqueue_bulk(token, std::forward<It>(itemFirst), count))) {
+			sema->signal((LightweightSemaphore::ssize_t)(ssize_t)count);
+			return true;
+		}
+		return false;
+	}
+	
+	// Enqueues a single item (by copying it).
+	// Does not allocate memory. Fails if not enough room to enqueue (or implicit
+	// production is disabled because Traits::INITIAL_IMPLICIT_PRODUCER_HASH_SIZE
+	// is 0).
+	// Thread-safe.
+	inline bool try_enqueue(T const& item)
+	{
+		if (inner.try_enqueue(item)) {
+			sema->signal();
+			return true;
+		}
+		return false;
+	}
+	
+	// Enqueues a single item (by moving it, if possible).
+	// Does not allocate memory (except for one-time implicit producer).
+	// Fails if not enough room to enqueue (or implicit production is
+	// disabled because Traits::INITIAL_IMPLICIT_PRODUCER_HASH_SIZE is 0).
+	// Thread-safe.
+	inline bool try_enqueue(T&& item)
+	{
+		if (inner.try_enqueue(std::move(item))) {
+			sema->signal();
+			return true;
+		}
+		return false;
+	}
+	
+	// Enqueues a single item (by copying it) using an explicit producer token.
+	// Does not allocate memory. Fails if not enough room to enqueue.
+	// Thread-safe.
+	inline bool try_enqueue(producer_token_t const& token, T const& item)
+	{
+		if (inner.try_enqueue(token, item)) {
+			sema->signal();
+			return true;
+		}
+		return false;
+	}
+	
+	// Enqueues a single item (by moving it, if possible) using an explicit producer token.
+	// Does not allocate memory. Fails if not enough room to enqueue.
+	// Thread-safe.
+	inline bool try_enqueue(producer_token_t const& token, T&& item)
+	{
+		if (inner.try_enqueue(token, std::move(item))) {
+			sema->signal();
+			return true;
+		}
+		return false;
+	}
+	
+	// Enqueues several items.
+	// Does not allocate memory (except for one-time implicit producer).
+	// Fails if not enough room to enqueue (or implicit production is
+	// disabled because Traits::INITIAL_IMPLICIT_PRODUCER_HASH_SIZE is 0).
+	// Note: Use std::make_move_iterator if the elements should be moved
+	// instead of copied.
+	// Thread-safe.
+	template<typename It>
+	inline bool try_enqueue_bulk(It itemFirst, size_t count)
+	{
+		if (inner.try_enqueue_bulk(std::forward<It>(itemFirst), count)) {
+			sema->signal((LightweightSemaphore::ssize_t)(ssize_t)count);
+			return true;
+		}
+		return false;
+	}
+	
+	// Enqueues several items using an explicit producer token.
+	// Does not allocate memory. Fails if not enough room to enqueue.
+	// Note: Use std::make_move_iterator if the elements should be moved
+	// instead of copied.
+	// Thread-safe.
+	template<typename It>
+	inline bool try_enqueue_bulk(producer_token_t const& token, It itemFirst, size_t count)
+	{
+		if (inner.try_enqueue_bulk(token, std::forward<It>(itemFirst), count)) {
+			sema->signal((LightweightSemaphore::ssize_t)(ssize_t)count);
+			return true;
+		}
+		return false;
+	}
+	
+	
+	// Attempts to dequeue from the queue.
+	// Returns false if all producer streams appeared empty at the time they
+	// were checked (so, the queue is likely but not guaranteed to be empty).
+	// Never allocates. Thread-safe.
+	template<typename U>
+	inline bool try_dequeue(U& item)
+	{
+		if (sema->tryWait()) {
+			while (!inner.try_dequeue(item)) {
+				continue;
+			}
+			return true;
+		}
+		return false;
+	}
+	
+	// Attempts to dequeue from the queue using an explicit consumer token.
+	// Returns false if all producer streams appeared empty at the time they
+	// were checked (so, the queue is likely but not guaranteed to be empty).
+	// Never allocates. Thread-safe.
+	template<typename U>
+	inline bool try_dequeue(consumer_token_t& token, U& item)
+	{
+		if (sema->tryWait()) {
+			while (!inner.try_dequeue(token, item)) {
+				continue;
+			}
+			return true;
+		}
+		return false;
+	}
+	
+	// Attempts to dequeue several elements from the queue.
+	// Returns the number of items actually dequeued.
+	// Returns 0 if all producer streams appeared empty at the time they
+	// were checked (so, the queue is likely but not guaranteed to be empty).
+	// Never allocates. Thread-safe.
+	template<typename It>
+	inline size_t try_dequeue_bulk(It itemFirst, size_t max)
+	{
+		size_t count = 0;
+		max = (size_t)sema->tryWaitMany((LightweightSemaphore::ssize_t)(ssize_t)max);
+		while (count != max) {
+			count += inner.template try_dequeue_bulk<It&>(itemFirst, max - count);
+		}
+		return count;
+	}
+	
+	// Attempts to dequeue several elements from the queue using an explicit consumer token.
+	// Returns the number of items actually dequeued.
+	// Returns 0 if all producer streams appeared empty at the time they
+	// were checked (so, the queue is likely but not guaranteed to be empty).
+	// Never allocates. Thread-safe.
+	template<typename It>
+	inline size_t try_dequeue_bulk(consumer_token_t& token, It itemFirst, size_t max)
+	{
+		size_t count = 0;
+		max = (size_t)sema->tryWaitMany((LightweightSemaphore::ssize_t)(ssize_t)max);
+		while (count != max) {
+			count += inner.template try_dequeue_bulk<It&>(token, itemFirst, max - count);
+		}
+		return count;
+	}
+	
+	
+	
+	// Blocks the current thread until there's something to dequeue, then
+	// dequeues it.
+	// Never allocates. Thread-safe.
+	template<typename U>
+	inline void wait_dequeue(U& item)
+	{
+		while (!sema->wait()) {
+			continue;
+		}
+		while (!inner.try_dequeue(item)) {
+			continue;
+		}
+	}
+
+	// Blocks the current thread until either there's something to dequeue
+	// or the timeout (specified in microseconds) expires. Returns false
+	// without setting `item` if the timeout expires, otherwise assigns
+	// to `item` and returns true.
+	// Using a negative timeout indicates an indefinite timeout,
+	// and is thus functionally equivalent to calling wait_dequeue.
+	// Never allocates. Thread-safe.
+	template<typename U>
+	inline bool wait_dequeue_timed(U& item, std::int64_t timeout_usecs)
+	{
+		if (!sema->wait(timeout_usecs)) {
+			return false;
+		}
+		while (!inner.try_dequeue(item)) {
+			continue;
+		}
+		return true;
+	}
+    
+    // Blocks the current thread until either there's something to dequeue
+	// or the timeout expires. Returns false without setting `item` if the
+    // timeout expires, otherwise assigns to `item` and returns true.
+	// Never allocates. Thread-safe.
+	template<typename U, typename Rep, typename Period>
+	inline bool wait_dequeue_timed(U& item, std::chrono::duration<Rep, Period> const& timeout)
+    {
+        return wait_dequeue_timed(item, std::chrono::duration_cast<std::chrono::microseconds>(timeout).count());
+    }
+	
+	// Blocks the current thread until there's something to dequeue, then
+	// dequeues it using an explicit consumer token.
+	// Never allocates. Thread-safe.
+	template<typename U>
+	inline void wait_dequeue(consumer_token_t& token, U& item)
+	{
+		while (!sema->wait()) {
+			continue;
+		}
+		while (!inner.try_dequeue(token, item)) {
+			continue;
+		}
+	}
+	
+	// Blocks the current thread until either there's something to dequeue
+	// or the timeout (specified in microseconds) expires. Returns false
+	// without setting `item` if the timeout expires, otherwise assigns
+	// to `item` and returns true.
+	// Using a negative timeout indicates an indefinite timeout,
+	// and is thus functionally equivalent to calling wait_dequeue.
+	// Never allocates. Thread-safe.
+	template<typename U>
+	inline bool wait_dequeue_timed(consumer_token_t& token, U& item, std::int64_t timeout_usecs)
+	{
+		if (!sema->wait(timeout_usecs)) {
+			return false;
+		}
+		while (!inner.try_dequeue(token, item)) {
+			continue;
+		}
+		return true;
+	}
+    
+    // Blocks the current thread until either there's something to dequeue
+	// or the timeout expires. Returns false without setting `item` if the
+    // timeout expires, otherwise assigns to `item` and returns true.
+	// Never allocates. Thread-safe.
+	template<typename U, typename Rep, typename Period>
+	inline bool wait_dequeue_timed(consumer_token_t& token, U& item, std::chrono::duration<Rep, Period> const& timeout)
+    {
+        return wait_dequeue_timed(token, item, std::chrono::duration_cast<std::chrono::microseconds>(timeout).count());
+    }
+	
+	// Attempts to dequeue several elements from the queue.
+	// Returns the number of items actually dequeued, which will
+	// always be at least one (this method blocks until the queue
+	// is non-empty) and at most max.
+	// Never allocates. Thread-safe.
+	template<typename It>
+	inline size_t wait_dequeue_bulk(It itemFirst, size_t max)
+	{
+		size_t count = 0;
+		max = (size_t)sema->waitMany((LightweightSemaphore::ssize_t)(ssize_t)max);
+		while (count != max) {
+			count += inner.template try_dequeue_bulk<It&>(itemFirst, max - count);
+		}
+		return count;
+	}
+	
+	// Attempts to dequeue several elements from the queue.
+	// Returns the number of items actually dequeued, which can
+	// be 0 if the timeout expires while waiting for elements,
+	// and at most max.
+	// Using a negative timeout indicates an indefinite timeout,
+	// and is thus functionally equivalent to calling wait_dequeue_bulk.
+	// Never allocates. Thread-safe.
+	template<typename It>
+	inline size_t wait_dequeue_bulk_timed(It itemFirst, size_t max, std::int64_t timeout_usecs)
+	{
+		size_t count = 0;
+		max = (size_t)sema->waitMany((LightweightSemaphore::ssize_t)(ssize_t)max, timeout_usecs);
+		while (count != max) {
+			count += inner.template try_dequeue_bulk<It&>(itemFirst, max - count);
+		}
+		return count;
+	}
+    
+    // Attempts to dequeue several elements from the queue.
+	// Returns the number of items actually dequeued, which can
+	// be 0 if the timeout expires while waiting for elements,
+	// and at most max.
+	// Never allocates. Thread-safe.
+	template<typename It, typename Rep, typename Period>
+	inline size_t wait_dequeue_bulk_timed(It itemFirst, size_t max, std::chrono::duration<Rep, Period> const& timeout)
+    {
+        return wait_dequeue_bulk_timed<It&>(itemFirst, max, std::chrono::duration_cast<std::chrono::microseconds>(timeout).count());
+    }
+	
+	// Attempts to dequeue several elements from the queue using an explicit consumer token.
+	// Returns the number of items actually dequeued, which will
+	// always be at least one (this method blocks until the queue
+	// is non-empty) and at most max.
+	// Never allocates. Thread-safe.
+	template<typename It>
+	inline size_t wait_dequeue_bulk(consumer_token_t& token, It itemFirst, size_t max)
+	{
+		size_t count = 0;
+		max = (size_t)sema->waitMany((LightweightSemaphore::ssize_t)(ssize_t)max);
+		while (count != max) {
+			count += inner.template try_dequeue_bulk<It&>(token, itemFirst, max - count);
+		}
+		return count;
+	}
+	
+	// Attempts to dequeue several elements from the queue using an explicit consumer token.
+	// Returns the number of items actually dequeued, which can
+	// be 0 if the timeout expires while waiting for elements,
+	// and at most max.
+	// Using a negative timeout indicates an indefinite timeout,
+	// and is thus functionally equivalent to calling wait_dequeue_bulk.
+	// Never allocates. Thread-safe.
+	template<typename It>
+	inline size_t wait_dequeue_bulk_timed(consumer_token_t& token, It itemFirst, size_t max, std::int64_t timeout_usecs)
+	{
+		size_t count = 0;
+		max = (size_t)sema->waitMany((LightweightSemaphore::ssize_t)(ssize_t)max, timeout_usecs);
+		while (count != max) {
+			count += inner.template try_dequeue_bulk<It&>(token, itemFirst, max - count);
+		}
+		return count;
+	}
+	
+	// Attempts to dequeue several elements from the queue using an explicit consumer token.
+	// Returns the number of items actually dequeued, which can
+	// be 0 if the timeout expires while waiting for elements,
+	// and at most max.
+	// Never allocates. Thread-safe.
+	template<typename It, typename Rep, typename Period>
+	inline size_t wait_dequeue_bulk_timed(consumer_token_t& token, It itemFirst, size_t max, std::chrono::duration<Rep, Period> const& timeout)
+    {
+        return wait_dequeue_bulk_timed<It&>(token, itemFirst, max, std::chrono::duration_cast<std::chrono::microseconds>(timeout).count());
+    }
+	
+	
+	// Returns an estimate of the total number of elements currently in the queue. This
+	// estimate is only accurate if the queue has completely stabilized before it is called
+	// (i.e. all enqueue and dequeue operations have completed and their memory effects are
+	// visible on the calling thread, and no further operations start while this method is
+	// being called).
+	// Thread-safe.
+	inline size_t size_approx() const
+	{
+		return (size_t)sema->availableApprox();
+	}
+	
+	
+	// Returns true if the underlying atomic variables used by
+	// the queue are lock-free (they should be on most platforms).
+	// Thread-safe.
+	static constexpr bool is_lock_free()
+	{
+		return ConcurrentQueue::is_lock_free();
+	}
+	
+
+private:
+	template<typename U, typename A1, typename A2>
+	static inline U* create(A1&& a1, A2&& a2)
+	{
+		void* p = (Traits::malloc)(sizeof(U));
+		return p != nullptr ? new (p) U(std::forward<A1>(a1), std::forward<A2>(a2)) : nullptr;
+	}
+	
+	template<typename U>
+	static inline void destroy(U* p)
+	{
+		if (p != nullptr) {
+			p->~U();
+		}
+		(Traits::free)(p);
+	}
+	
+private:
+	ConcurrentQueue inner;
+	std::unique_ptr<LightweightSemaphore, void (*)(LightweightSemaphore*)> sema;
+};
+
+
+template<typename T, typename Traits>
+inline void swap(BlockingConcurrentQueue<T, Traits>& a, BlockingConcurrentQueue<T, Traits>& b) MOODYCAMEL_NOEXCEPT
+{
+	a.swap(b);
+}
+
+}	// end namespace moodycamel
--- a/duix-sdk/src/main/cpp/dhcore/concurrentqueue.h
+++ b/duix-sdk/src/main/cpp/dhcore/concurrentqueue.h
--- a/duix-sdk/src/main/cpp/dhcore/dh_atomic.h
+++ b/duix-sdk/src/main/cpp/dhcore/dh_atomic.h
--- a/duix-sdk/src/main/cpp/dhcore/dh_data.cpp
+++ b/duix-sdk/src/main/cpp/dhcore/dh_data.cpp
@ -0,0 +1,391 @@
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+#include <pthread.h>
+#include "dh_data.h"
+#ifdef WIN32
+#include <windows.h>
+#else
+//#include <sys/timeb.h>
+#include <unistd.h>
+#endif
+#include <time.h>
+#include "dh_mem.h"
+
+jmat_t* jmat_addref(jmat_t* mat){
+  if(mat) dhmem_ref(mat);
+  return mat;
+}
+
+jmat_t* jmat_deref(jmat_t* mat){
+  if(!mat)return NULL;
+  return (jmat_t*)dhmem_deref(mat);
+}
+
+void* jdata_addref(void* data){
+  if(!data)return NULL;
+  return dhmem_ref(data);
+}
+
+void* jdata_deref(void* data){
+  if(!data)return NULL;
+  return dhmem_deref(data);
+}
+
+static void my_jbuf_destroy(void* arg){
+  jbuf_t* buf = (jbuf_t*)arg;
+  //printf("===jbuf destroy %p\n",buf);
+  //
+}
+
+jbuf_t* jbuf_allocex(char* mem,int size,dhmem_destroy_h fndestroy){
+  jbuf_t* buf = (jbuf_t*)dhmem_alloc(sizeof(jbuf_t),fndestroy);
+  memset(buf,0,sizeof(jbuf_t));
+
+  return buf;
+}
+
+jbuf_t* jbuf_alloc(int size){
+  int len = size>0?size:0;
+  jbuf_t* buf = (jbuf_t*)dhmem_alloc(sizeof(jbuf_t)+len,my_jbuf_destroy);
+  //printf("===jbuf alloc %p\n",buf);
+  memset(buf,0,len+sizeof(jbuf_t));
+  if(size>0){
+    buf->data = (char*)buf + sizeof(jbuf_t);
+  }else{
+    buf->data = NULL;
+  }
+  buf->size = size;
+  return buf;
+}
+
+jbuf_t* jbuf_strdup(char* txt,int pos){
+  int len = strlen(txt);
+  if((pos>0)&&(pos<len))len = pos;
+  char* pb = txt; 
+  int size = len;
+  jbuf_t* buf = jbuf_alloc(size+1);
+  memcpy(buf->data,pb,size);
+  buf->data[size]=0;
+  return buf;
+}
+
+jbuf_t* jbuf_dupmem(char* mem,int size){
+  jbuf_t* buf = jbuf_alloc(size);
+  if(size) memcpy(buf->data,mem,size);
+  return buf;
+}
+
+jbuf_t* jbuf_refmem(char* mem,int size){
+  jbuf_t* buf = jbuf_alloc(0);
+  buf->data = mem;
+  buf->size = size;
+  buf->ref = 1;
+  return buf;
+}
+
+jbuf_t* jbuf_null(uint64_t sessid){
+  jbuf_t* buf = jbuf_alloc(0);
+  buf->sessid = sessid;
+  buf->data = NULL;
+  buf->size = 0;
+  buf->ref = 0;
+  return buf;
+}
+
+int       jbuf_zeros(jbuf_t* buf){
+  if(buf->size>0){
+    memset(buf->data,0,buf->size);
+  }
+  return 0;
+}
+
+int       jbuf_free(jbuf_t* buf){
+  dhmem_deref(buf);
+  return 0;
+}
+
+int       jbuf_copy(jbuf_t* dst,jbuf_t* src){
+  int size = src->size;
+  if(size>dst->size)size = dst->size;
+  memcpy(dst->data,src->data,size);
+  return 0;
+}
+
+int       jmat_dump(jmat_t* mat){
+  if(mat->gpu){
+    printf("===w %d h %d c %d d %d b %d p %p \n",
+        mat->width,mat->height,mat->channel,mat->stride,mat->bit,mat->data);
+    return 0;
+  }
+  printf("===w %d h %d c %d d %d b %d p %p [\n",
+      mat->width,mat->height,mat->channel,mat->stride,mat->bit,mat->data);
+  int rgb = (mat->channel==3)?1:0;
+  if(mat->bit == 4){
+    for(int m=0;m<3;m++){
+      printf("[");
+      float* pa = (float*)jmat_row(mat,m);
+      for(int k=0;k<3;k++){
+        if(rgb){
+          printf("[%f %f %f]",pa[0],pa[1],pa[2]);
+          pa+=3;
+        }else{
+          printf(" %f ",*pa++);
+        }
+      }
+      if(rgb){
+        pa = (float*)jmat_row(mat,m) + mat->width*mat->channel - 9;
+      }else{
+        pa = (float*)jmat_row(mat,m) + mat->width*mat->channel - 3;
+      }
+      //printf("\n====offset %ld\n",(char*)pa - mat->data);
+      printf("====");
+      for(int k=0;k<3;k++){
+        if(rgb){
+          printf("[%f %f %f]",pa[0],pa[1],pa[2]);
+          pa+=3;
+        }else{
+          printf(" %f ",*pa++);
+        }
+      }
+      printf("]\n");
+    }
+    for(int m=3;m>0;m--){
+      printf("[");
+      float* pa = (float*)jmat_row(mat,mat->height - m);
+      for(int k=0;k<3;k++){
+        if(rgb){
+          printf("[%f %f %f]",pa[0],pa[1],pa[2]);
+          pa+=3;
+        }else{
+          printf(" %f ",*pa++);
+        }
+      }
+      if(rgb){
+        pa = (float*)jmat_row(mat,mat->height - m) + mat->width*mat->channel - 9;
+      }else{
+        pa = (float*)jmat_row(mat,mat->height - m) + mat->width*mat->channel - 3;
+      }
+      printf("====");
+      for(int k=0;k<3;k++){
+        if(rgb){
+          printf("[%f %f %f]",pa[0],pa[1],pa[2]);
+          pa+=3;
+        }else{
+          printf(" %f ",*pa++);
+        }
+      }
+      printf("]\n");
+    }
+  }else{
+    for(int m=0;m<3;m++){
+      printf("[");
+      uint8_t* pa = (uint8_t*)jmat_row(mat,m);
+      for(int k=0;k<3;k++){
+        printf("[%d %d %d]",pa[0],pa[1],pa[2]);
+        pa+=3;
+      }
+      pa = (uint8_t*)jmat_row(mat,m) + mat->width*mat->channel - 9;
+      printf("====");
+      for(int k=0;k<3;k++){
+        printf("[%d %d %d]",pa[0],pa[1],pa[2]);
+        pa+=3;
+      }
+      printf("]\n");
+    }
+    for(int m=3;m>0;m--){
+      printf("[");
+      uint8_t* pa = (uint8_t*)jmat_row(mat,mat->height - m);
+      for(int k=0;k<3;k++){
+        printf("[%d %d %d]",pa[0],pa[1],pa[2]);
+        pa+=3;
+      }
+      pa = (uint8_t*)jmat_row(mat,mat->height - m) + mat->width*mat->channel - 9;
+      printf("====");
+      for(int k=0;k<3;k++){
+        printf("[%d %d %d]",pa[0],pa[1],pa[2]);
+        pa+=3;
+      }
+      printf("]\n");
+    }
+  }
+  printf("]=====\n");
+  return 0;
+}
+
+static void my_jmat_destroy(void* arg){
+  jmat_t* mat = (jmat_t*)arg;
+  if(!mat->buf.ref){
+    dhmem_deref(mat->data);
+    mat->data = NULL;
+  }
+  jbuf_t* buf = mat->buf.next;
+  while(buf){
+    jbuf_t* tbuf = buf;
+    buf = buf->next;
+    dhmem_deref(tbuf);
+  }
+  //if(mat->rmat)dhmem_deref(mat->rmat);
+  //if(mat->bmat)dhmem_deref(mat->bmat);
+  //printf("===jmat destroy %p \n",mat);
+}
+
+jmat_t* jmat_allocex(int w,int h,int c ,int d, int b,void* mem,dhmem_destroy_h fndestroy){
+  int bit = b?b:1;
+  int stride = d?d:w*c;
+  int size = bit*stride*h;
+  int realsize = 0;//mem?0:size;
+  realsize = sizeof(jmat_t);
+  jmat_t* mat = (jmat_t*)dhmem_alloc(realsize,fndestroy);
+  //printf("===jmat alloc %p\n",mat);
+  //printf("===jmat alloc %p \n",mat);
+  memset(mat,0,realsize);
+  jbuf_t* buf = (jbuf_t*)&mat->buf;
+  mat->width = w;
+  mat->height = h;
+  mat->channel = c;
+  mat->bit = bit;
+  mat->stride = stride;
+  buf->data = (char*)mem;
+  buf->size = size;
+  mat->data = buf->data;
+  return mat;
+}
+
+jmat_t* jmat_null(){
+  jmat_t* mat = (jmat_t*)dhmem_zalloc(sizeof(jmat_t),my_jmat_destroy);
+  return mat;
+}
+
+jmat_t* jmat_alloc(int w,int h,int c ,int d, int b,void* mem){
+  int bit = b?b:1;
+  int stride = d?d:w*c;
+  int size = bit*stride*h;
+  int realsize = sizeof(jmat_t);
+  jmat_t* mat = (jmat_t*)dhmem_zalloc(realsize,my_jmat_destroy);
+  //printf("===jmat alloc %p\n",mat);
+  //printf("===jmat alloc %p \n",mat);
+  jbuf_t* buf = (jbuf_t*)&mat->buf;
+  mat->width = w;
+  mat->height = h;
+  mat->channel = c;
+  mat->bit = bit;
+  mat->stride = stride;
+  if(mem){
+    buf->data = (char*)mem;
+    buf->ref = 1;
+  }else{
+    buf->data = (char*)dhmem_zalloc(size,NULL);
+  }
+  buf->size = size;
+  mat->data = buf->data;
+  return mat;
+}
+
+jmat_t* jmat_crgb(int w,int h,uint8_t *mem){
+  jmat_t* mat = jmat_alloc(w,h,3,0,1,mem);
+  return mat;
+}
+
+char*   jmat_row(jmat_t* mat,int row){
+  if(row>=mat->height)return NULL;
+  int offset =  row*mat->stride*mat->bit;
+  //printf("==row %d stride %d offset %d\n",row,mat->stride,offset);
+  return mat->data + offset;
+}
+
+char*   jmat_item(jmat_t* mat,int col,int row){
+  if(row>=mat->height)return NULL;
+  if(col>=mat->width)return NULL;
+  int offset =  row*mat->stride*mat->bit + col*mat->bit;
+  return mat->data + offset;
+}
+
+int       jmat_zero(jmat_t* src){
+  return jbuf_zeros(&src->buf);
+}
+
+jmat_t*   jmat_clone(jmat_t* mat){
+  jmat_t* dst = NULL;
+  dst = jmat_alloc(mat->width,mat->height,mat->channel,mat->stride,mat->bit,NULL);
+  memcpy(dst->data,mat->data,mat->buf.size);
+  return dst;
+}
+
+int     jmat_free(jmat_t* mat){
+  if(mat) dhmem_deref(mat);
+  //printf("===jmat free %p \n",mat);
+  return 0;
+}
+
+int       jmat_copy(jmat_t* dst,jmat_t* src){
+  if(dst->buf.size!=src->buf.size)return -1;
+  memcpy(dst->data,src->data,dst->buf.size);
+  return 0;
+}
+
+int jmat_reshape(jmat_t* mat,int w,int h){
+  mat->width = w;
+  mat->height = h;
+  mat->stride = w*mat->channel;
+  return 0;
+}
+
+int jmat_reroi(jmat_t* mat,jmat_t* src,int w,int h,int l,int t){
+  int d = src->stride;
+  int c = src->channel;
+  int b = src->bit;
+  int s = b*d*h;
+  char* mem = src->data + t*d*b + l*c*b;
+  //
+  jbuf_t* buf = (jbuf_t*)&mat->buf;
+
+  mat->width = w;
+  mat->height = h;
+  mat->channel = c;
+  mat->bit = b;
+  mat->stride = d;
+  buf->data = (char*)mem;
+  buf->size = s;
+  mat->data = buf->data;
+  mat->gpu = src->gpu;
+  mat->buf.ref = 1;
+  return 0;
+}
+
+jmat_t* jmat_roi(jmat_t* mat,int w,int h,int l,int t){
+  int d = mat->stride;
+  int c = mat->channel;
+  int b = mat->bit;
+  char* roidata = mat->data + t*d*b + l*c*b;
+  jmat_t* roimat = jmat_alloc(w,h,c,d,b,roidata);
+  roimat->gpu = mat->gpu;
+  return roimat;
+}
+
+
+uint64_t jtimer_msstamp(){
+  struct timespec ts;
+#ifdef WIN32
+  //return clock();
+  clock_gettime(0, &ts);
+#else
+  clock_gettime(CLOCK_MONOTONIC, &ts);
+#endif
+  return (ts.tv_sec*1000l) + (ts.tv_nsec/CLOCKS_PER_SEC);
+}
+
+
+void jtimer_mssleep(int ms) {
+#ifdef WIN32
+  Sleep(ms);
+#else
+  /*
+     struct timeval delay;
+     delay.tv_sec = 0;
+     delay.tv_usec = ms * 1000; // 20 ms
+     select(0, NULL, NULL, NULL, &delay);
+     */
+  usleep(ms*1000);
+#endif
+}
--- a/duix-sdk/src/main/cpp/dhcore/dh_data.h
+++ b/duix-sdk/src/main/cpp/dhcore/dh_data.h
@ -0,0 +1,77 @@
+#ifndef GJ_MEDDATA_H
+#define GJ_MEDDATA_H
+#include <stdint.h>
+#include "dh_mem.h"
+
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+  typedef struct jbuf_s jbuf_t;
+
+  struct jbuf_s{
+    char    *data;
+    int     size;
+    uint64_t  sessid;
+    int64_t   tag;
+    int     ref;
+    jbuf_t  *next;
+  };
+
+  jbuf_t* jbuf_alloc(int size);
+  jbuf_t* jbuf_strdup(char* txt,int size);
+  jbuf_t* jbuf_dupmem(char* mem,int size);
+  jbuf_t* jbuf_refmem(char* mem,int size);
+  jbuf_t* jbuf_null(uint64_t sessid);
+  int       jbuf_zeros(jbuf_t* buf);
+  int       jbuf_free(jbuf_t* buf);
+  int       jbuf_copy(jbuf_t* dst,jbuf_t* src);
+
+  typedef struct jmat_s jmat_t;
+  struct jmat_s{
+    jbuf_t    buf;
+    char      *data;
+    int       width;
+    int       height;
+    int       channel;
+    int       stride;
+    int       bit;  
+    int       gpu;
+    //jmat_t    *rmat;    
+    //jmat_t    *bmat;    
+  };
+
+  jmat_t* jmat_null();
+  jmat_t* jmat_allocex(int w,int h,int c ,int d, int b,void* mem,dhmem_destroy_h fndestroy);
+  jmat_t* jmat_alloc(int w,int h,int c ,int d, int b,void* mem);
+  jmat_t* jmat_crgb(int w,int h,uint8_t *mem); 
+  char*   jmat_row(jmat_t* mat,int row);
+  char*   jmat_item(jmat_t* mat,int col,int row);
+  int     jmat_free(jmat_t* mat);
+  int jmat_reshape(jmat_t* mat,int w,int h);
+  jmat_t* jmat_roi(jmat_t* mat,int w,int h,int l,int t);
+  int       jmat_reroi(jmat_t* mat,jmat_t* src,int w,int h,int l,int t);
+  int       jmat_copy(jmat_t* dst,jmat_t* src);
+  int       jmat_dump(jmat_t* mat);
+
+  jmat_t*   jmat_clone(jmat_t* mat);
+  //jmat_t* jmat_roi(jmat_t* src,int left,int top,int width,int height);
+  int       jmat_zero(jmat_t* src);
+
+  jmat_t* jmat_addref(jmat_t* mat);
+  jmat_t* jmat_deref(jmat_t* mat);
+
+
+
+
+  void* jdata_addref(void* data);
+  void* jdata_deref(void* data);
+  uint64_t jtimer_msstamp();  
+  void jtimer_mssleep(int ms) ;
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
--- a/duix-sdk/src/main/cpp/dhcore/dh_mem.c
+++ b/duix-sdk/src/main/cpp/dhcore/dh_mem.c
@ -0,0 +1,300 @@
+/**
+ * @file mem.c  Memory management with reference counting
+ *
+ * Copyright (C) 2010 Creytiv.com
+ */
+#include <ctype.h>
+#include <stdlib.h>
+#include <string.h>
+
+#include "dh_atomic.h"
+#include "dh_mem.h"
+
+
+
+
+
+/** Defines a reference-counting memory object */
+struct dhmem {
+	DH_ATOMIC uint32_t nrefs; /**< Number of references  */
+	uint32_t size;         /**< Size of memory object */
+	dhmem_destroy_h *dh;     /**< Destroy handler       */
+};
+
+
+#define STAT_ALLOC(_m, _size) (_m)->size = (uint32_t)(_size);
+#define STAT_REALLOC(_m, _size) (_m)->size = (uint32_t)(_size);
+#define STAT_DEREF(_m)
+#define MAGIC_CHECK(_m)
+
+
+enum {
+#if defined(__x86_64__)
+	/* Use 16-byte alignment on x86-x32 as well */
+	dhmem_alignment = 16u,
+#else
+	dhmem_alignment = sizeof(void*) >= 8u ? 16u : 8u,
+#endif
+	alignment_mask = dhmem_alignment - 1u,
+	dhmem_header_size = (sizeof(struct dhmem) + alignment_mask) &
+		(~(size_t)alignment_mask)
+};
+
+#define MEM_SIZE_MAX \
+	(size_t)(sizeof(size_t) > sizeof(uint32_t) ? \
+		(~(uint32_t)0u) : (~(size_t)0u) - dhmem_header_size)
+
+
+static inline struct dhmem *get_mem(void *p)
+{
+	return (struct dhmem *)(void *)(((unsigned char *)p) - dhmem_header_size);
+}
+
+
+static inline void *get_dhmem_data(struct dhmem *m)
+{
+	return (void *)(((unsigned char *)m) + dhmem_header_size);
+}
+
+char    *dhstr_dup(char* txt){
+  int len = strlen(txt);
+  char* str = (char*)dhmem_zalloc(len+1,NULL);
+  memcpy(str,txt,len);
+  return str;
+}
+
+/**
+ * Allocate a new reference-counted memory object
+ *
+ * @param size Size of memory object
+ * @param dh   Optional destructor, called when destroyed
+ *
+ * @return Pointer to allocated object
+ */
+void *dhmem_alloc(size_t size, dhmem_destroy_h *dh)
+{
+	struct dhmem *m;
+
+	if (size > MEM_SIZE_MAX)
+		return NULL;
+
+
+	m = (struct dhmem*)malloc(dhmem_header_size + size);
+	if (!m)
+		return NULL;
+
+	dh_atomic_rlx_set(&m->nrefs, 1u);
+	m->dh    = dh;
+
+	STAT_ALLOC(m, size);
+
+	return get_dhmem_data(m);
+}
+
+
+/**
+ * Allocate a new reference-counted memory object. Memory is zeroed.
+ *
+ * @param size Size of memory object
+ * @param dh   Optional destructor, called when destroyed
+ *
+ * @return Pointer to allocated object
+ */
+void *dhmem_zalloc(size_t size, dhmem_destroy_h *dh)
+{
+	void *p;
+
+	p = dhmem_alloc(size, dh);
+	if (!p)
+		return NULL;
+
+	memset(p, 0, size);
+
+	return p;
+}
+
+
+/**
+ * Re-allocate a reference-counted memory object
+ *
+ * @param data Memory object
+ * @param size New size of memory object
+ *
+ * @return New pointer to allocated object
+ *
+ * @note Realloc NULL pointer is not supported
+ */
+void *dhmem_realloc(void *data, size_t size)
+{
+	struct dhmem *m, *m2;
+
+	if (!data)
+		return NULL;
+
+	if (size > MEM_SIZE_MAX)
+		return NULL;
+
+	m = get_mem(data);
+
+	MAGIC_CHECK(m);
+
+	if (dh_atomic_acq(&m->nrefs) > 1u) {
+		void* p = dhmem_alloc(size, m->dh);
+		if (p) {
+			memcpy(p, data, m->size);
+			dhmem_deref(data);
+		}
+		return p;
+	}
+
+
+	m2 = (struct dhmem*)realloc(m, dhmem_header_size + size);
+
+	if (!m2) {
+		return NULL;
+	}
+
+	STAT_REALLOC(m2, size);
+
+	return get_dhmem_data(m2);
+}
+
+
+/**
+ * Re-allocate a reference-counted array
+ *
+ * @param ptr      Pointer to existing array, NULL to allocate a new array
+ * @param nmemb    Number of members in array
+ * @param membsize Number of bytes in each member
+ * @param dh       Optional destructor, only used when ptr is NULL
+ *
+ * @return New pointer to allocated array
+ */
+void *dhmem_reallocarray(void *ptr, size_t nmemb, size_t membsize,
+		       dhmem_destroy_h *dh)
+{
+	size_t tsize;
+
+	if (membsize && nmemb > MEM_SIZE_MAX / membsize) {
+		return NULL;
+	}
+
+	tsize = nmemb * membsize;
+
+	if (ptr) {
+		return dhmem_realloc(ptr, tsize);
+	}
+	else {
+		return dhmem_alloc(tsize, dh);
+	}
+}
+
+
+/**
+ * Set or unset a destructor for a memory object
+ *
+ * @param data Memory object
+ * @param dh   called when destroyed, NULL for remove
+ */
+void dhmem_destructor(void *data, dhmem_destroy_h *dh)
+{
+	struct dhmem *m;
+
+	if (!data)
+		return;
+
+	m = get_mem(data);
+
+	MAGIC_CHECK(m);
+
+	m->dh = dh;
+}
+
+
+/**
+ * Reference a reference-counted memory object
+ *
+ * @param data Memory object
+ *
+ * @return Memory object (same as data)
+ */
+void *dhmem_ref(void *data)
+{
+	struct dhmem *m;
+
+	if (!data)
+		return NULL;
+
+	m = get_mem(data);
+
+	MAGIC_CHECK(m);
+
+	dh_atomic_rlx_add(&m->nrefs, 1u);
+
+	return data;
+}
+
+
+/**
+ * Dereference a reference-counted memory object. When the reference count
+ * is zero, the destroy handler will be called (if present) and the memory
+ * will be freed
+ *
+ * @param data Memory object
+ *
+ * @return Always NULL
+ */
+/* coverity[-tainted_data_sink: arg-0] */
+void *dhmem_deref(void *data)
+{
+	struct dhmem *m;
+
+	if (!data)
+		return NULL;
+
+	m = get_mem(data);
+
+	MAGIC_CHECK(m);
+
+	if (dh_atomic_acq_sub(&m->nrefs, 1u) > 1u) {
+		return NULL;
+	}
+
+	if (m->dh)
+		m->dh(data);
+
+	/* NOTE: check if the destructor called dhmem_ref() */
+	if (dh_atomic_rlx(&m->nrefs) > 0u)
+		return NULL;
+
+
+	STAT_DEREF(m);
+
+	free(m);
+
+	return NULL;
+}
+
+
+/**
+ * Get number of references to a reference-counted memory object
+ *
+ * @param data Memory object
+ *
+ * @return Number of references
+ */
+uint32_t dhmem_nrefs(const void *data)
+{
+	struct dhmem *m;
+
+	if (!data)
+		return 0;
+
+	m = get_mem((void*)data);
+
+	MAGIC_CHECK(m);
+
+	return (uint32_t)dh_atomic_acq(&m->nrefs);
+}
+
+
--- a/duix-sdk/src/main/cpp/dhcore/dh_mem.h
+++ b/duix-sdk/src/main/cpp/dhcore/dh_mem.h
@ -0,0 +1,28 @@
+#ifndef GJ_DHMEM_H
+#define GJ_DHMEM_H
+#include <stdint.h>
+#include <string.h>
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+typedef void (dhmem_destroy_h)(void *data);
+
+char    *dhstr_dup(char* txt);
+void    *dhmem_alloc(size_t size, dhmem_destroy_h *dh);
+void    *dhmem_zalloc(size_t size, dhmem_destroy_h *dh);
+void    *dhmem_realloc(void *data, size_t size);
+void    *dhmem_reallocarray(void *ptr, size_t nmemb,
+			  size_t membsize, dhmem_destroy_h *dh);
+void     dhmem_destructor(void *data, dhmem_destroy_h *dh);
+void    *dhmem_ref(void *data);
+void    *dhmem_deref(void *data);
+uint32_t dhmem_nrefs(const void *data);
+
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
--- a/duix-sdk/src/main/cpp/dhcore/dh_que.cpp
+++ b/duix-sdk/src/main/cpp/dhcore/dh_que.cpp
@ -0,0 +1,241 @@
+#include "dh_que.h"
+
+#include "readerwriterqueue.h"
+#include "concurrentqueue.h"
+#include "blockingconcurrentqueue.h"
+#include "dh_atomic.h"
+
+typedef moodycamel::ReaderWriterQueue<jbuf_t*>  ReaderWriterQueue;
+typedef moodycamel::ConcurrentQueue<jbuf_t*> ConcurrentQueue;
+typedef moodycamel::BlockingConcurrentQueue<jbuf_t*> BlockingConcurrentQueue;
+
+typedef int (*jqfn_pop)(jqueue_t* que,int flush,jbuf_t** pbuf);
+typedef int (*jqfn_push)(jqueue_t* que,int flush,jbuf_t* buf);
+
+struct jqueue_s{
+  void        *m_obj;
+  int         m_kind;
+	DH_ATOMIC uint32_t nrefs; /**< Number of references  */
+  int         m_cache;
+  //jbuf_t      *m_readcache;
+  //jbuf_t      *m_writecache;
+  jqfn_push   fn_push;
+  jqfn_pop    fn_pop;
+  uint64_t    m_lastsess;
+};
+
+typedef struct{
+  jqueue_t  que;
+  jbuf_t    *m_head;
+  jbuf_t    *m_tail;
+  pthread_mutex_t  m_lock; 
+}jlockque_t;
+
+static int simp_push(jqueue_t* que,int flush,jbuf_t* buf){
+  void* obj = que->m_obj;
+  //if(flush){
+  return reinterpret_cast<ReaderWriterQueue*>(obj)->enqueue(buf);
+  //}else{
+  //return reinterpret_cast<ReaderWriterQueue*>(obj)->try_enqueue(buf);
+  //}
+}
+
+static int simp_pop(jqueue_t* que,int flush,jbuf_t** pbuf){
+  void* obj = que->m_obj;
+  return reinterpret_cast<ReaderWriterQueue*>(obj)->try_dequeue(*pbuf);
+}
+
+static int muti_push(jqueue_t* que,int flush,jbuf_t* buf){
+  void* obj = que->m_obj;
+  //if(flush){
+  return reinterpret_cast<BlockingConcurrentQueue*>(obj)->enqueue(buf);
+  //}else{
+  //return reinterpret_cast<BlockingConcurrentQueue*>(obj)->try_enqueue(buf);
+  //}
+}
+
+static int muti_pop(jqueue_t* que,int flush,jbuf_t** pbuf){
+  void* obj = que->m_obj;
+  return reinterpret_cast<BlockingConcurrentQueue*>(obj)->try_dequeue(*pbuf);
+}
+
+static int lock_push(jqueue_t* que,int flush,jbuf_t* buf){
+  jlockque_t* exque = reinterpret_cast<jlockque_t*>(que);
+  buf->next = NULL;
+  pthread_mutex_lock(&exque->m_lock);
+  if(exque->m_tail){
+    if(exque->m_head==exque->m_tail){
+      exque->m_head->next = buf;
+      exque->m_tail = buf;
+    }else{
+      exque->m_tail->next = buf;
+      exque->m_tail = buf;
+    }
+  }else{
+    exque->m_head = buf;
+    exque->m_tail = buf;
+  }
+  pthread_mutex_unlock(&exque->m_lock);
+  //printf("===push %p one %d %p\n",que,que->m_size,buf);
+  //printf("===que %p head %p tail %p\n",que,exque->m_head,exque->m_tail);
+  return 1;  //
+}
+
+static int lock_pop(jqueue_t* que,int flush,jbuf_t** pbuf){
+  jlockque_t* exque = reinterpret_cast<jlockque_t*>(que);
+  jbuf_t* buf = NULL;
+  int rst = 0;
+  pthread_mutex_lock(&exque->m_lock);
+  buf = exque->m_head;
+  if(buf){
+    if(exque->m_tail==buf){
+      exque->m_head = NULL;
+      exque->m_tail = NULL;
+    }else{
+      exque->m_head = buf->next;
+    }
+    buf->next = NULL;
+  }
+  pthread_mutex_unlock(&exque->m_lock);
+  //printf("===pop %p one %d %p\n",que,que->m_size,buf);
+  //printf("===que %p head %p tail %p\n",que,exque->m_head,exque->m_tail);
+  *pbuf = buf;
+  return rst;
+}
+
+
+void my_jque_destroy(void* arg){
+  jqueue_t* que = (jqueue_t*)arg;
+
+  /*
+     buf= que->m_readcache;
+     while(buf){
+     jbuf_t* one = buf->next;
+     jbuf_free(buf);
+     buf = one;
+     }
+     buf= que->m_writecache;
+     while(buf){
+     jbuf_t* one = buf->next;
+     jbuf_free(buf);
+     buf = one;
+     }
+     */
+  jbuf_t* buf  = NULL;
+  que->fn_pop(que,1,&buf);
+  while(buf){
+    jbuf_free(buf);
+    buf = NULL;
+    que->fn_pop(que,1,&buf);
+    //printf("===free one %p\n",buf);
+  }
+
+  if(que->m_kind==GQUE_SIMP){
+    delete reinterpret_cast<ReaderWriterQueue*>(que->m_obj);
+  }else{
+    delete reinterpret_cast<BlockingConcurrentQueue*>(que->m_obj);
+  }
+
+}
+
+jqueue_t*  jque_alloc(int size,int cache,int kind){
+  jqueue_t* que = NULL;
+  //if(kind==GQUE_LOCK){
+  if(0){
+    jlockque_t* exq = (jlockque_t*)dhmem_alloc(sizeof(jlockque_t),my_jque_destroy);
+    memset(exq,0,sizeof(jlockque_t));
+    pthread_mutex_init(&exq->m_lock,NULL);
+    que = reinterpret_cast<jqueue_t*>(exq);
+    que->fn_pop = lock_pop;
+    que->fn_push = lock_push;
+  }else{
+    que = (jqueue_t*)dhmem_alloc(sizeof(jqueue_t),my_jque_destroy);
+    memset(que,0,sizeof(jqueue_t));
+    if(kind==GQUE_SIMP){
+      que->m_obj = new ReaderWriterQueue();
+      que->fn_push = simp_push;
+      que->fn_pop = simp_pop;
+    }else {
+      que->m_obj = new BlockingConcurrentQueue();
+      que->fn_push = muti_push;
+      que->fn_pop = muti_pop;
+    }
+  }
+  if(que){
+    que->m_cache = cache;
+    que->m_kind = kind;
+  }
+  return que;
+}
+
+int jque_push(jqueue_t* que,jbuf_t* buf){
+  if(!buf)return 0;
+  if(buf->sessid>que->m_lastsess) que->m_lastsess = buf->sessid;
+  /*
+     if(que->m_cache){
+     while(que->m_writecache){
+     jbuf_t* one = que->m_writecache;
+     que->m_writecache = one->next;
+     que->fn_push(que,1,one);
+     }
+     }
+     */
+  int rst = que->fn_push(que,!que->m_cache,buf);
+	dh_atomic_rlx_add(&que->nrefs, 1u);
+  /*
+     if(!rst&&que->m_cache){
+     if(que->m_writecache){
+     jbuf_t* tail = que->m_writecache;
+     while(tail->next)tail = tail->next;
+     tail->next = buf;
+     }else{
+     que->m_writecache = buf;
+     }
+     rst = 1;
+     }
+     */
+  return rst;
+}
+
+jbuf_t* jque_pop(jqueue_t* que,uint64_t sessid){
+  if(sessid&&(sessid<que->m_lastsess)){
+    //printf("===last %ld of %ld\n",sessid,que->m_lastsess);
+    return NULL;
+  }
+  int rst = 0;
+  jbuf_t* buf = NULL;
+
+  /*
+     if(que->m_readcache){
+     buf = que->m_readcache;
+     que->m_readcache = que->m_readcache->next;
+     buf->next = NULL;
+     rst = 1;
+     }else{
+     */
+  rst = que->fn_pop(que,0,&buf);
+  if(buf)dh_atomic_acq_sub(&que->nrefs, 1u);
+  /*
+     if(rst&&buf){
+     que->m_readcache = buf->next;
+     buf->next = NULL;
+     }
+     }
+     */
+  return buf;
+}
+
+jbuf_t* jque_popall(jqueue_t* que){
+  return NULL;
+}
+
+int jque_size(jqueue_t* que){
+  return que->nrefs;
+}
+
+int jque_free(jqueue_t* que){
+  dhmem_deref(que);
+  return 0;
+}
+
+
--- a/duix-sdk/src/main/cpp/dhcore/dh_que.h
+++ b/duix-sdk/src/main/cpp/dhcore/dh_que.h
@ -0,0 +1,29 @@
+#ifndef GJ_MEDQUE_H
+#define GJ_MEDQUE_H
+#include "dh_data.h"
+
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+
+typedef struct jqueue_s jqueue_t;
+#define GQUE_SIMP  1001
+#define GQUE_MUTI  1003
+#define GQUE_LOCK  1005
+
+  jqueue_t*   jque_alloc(int size,int cache,int kind);
+  int         jque_push(jqueue_t* que,jbuf_t* buf);
+  jbuf_t*     jque_pop(jqueue_t* que,uint64_t sessid);
+  jbuf_t*     jque_popall(jqueue_t* que);
+  int         jque_size(jqueue_t* que);
+  int         jque_free(jqueue_t* que);
+
+
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
--- a/duix-sdk/src/main/cpp/dhcore/dh_types.h
+++ b/duix-sdk/src/main/cpp/dhcore/dh_types.h
@ -0,0 +1,389 @@
+/**
+ * @file re_types.h  Defines basic types
+ *
+ * Copyright (C) 2010 Creytiv.com
+ */
+
+#include <assert.h>
+#include <stddef.h>
+#include <sys/types.h>
+
+#ifdef __cplusplus
+#define restrict
+#endif
+
+#ifdef _MSC_VER
+#include <stdlib.h>
+
+#include <BaseTsd.h>
+typedef SSIZE_T ssize_t;
+
+#endif
+
+/*
+ * Basic integral types and boolean from C99
+ */
+#include <inttypes.h>
+#include <stdbool.h>
+
+
+/* Needed for MS compiler */
+#ifdef _MSC_VER
+#ifndef __cplusplus
+#define inline _inline
+#endif
+#endif
+
+
+/*
+ * Misc macros
+ */
+
+/** Defines the NULL pointer */
+#ifndef NULL
+#define NULL ((void *)0)
+#endif
+
+/** Get number of elements in an array */
+#define DH_ARRAY_SIZE(a) ((sizeof(a))/(sizeof((a)[0])))
+
+
+/** Align a value to the boundary of mask */
+#define DH_ALIGN_MASK(x, mask)    (((x)+(mask))&~(mask))
+
+/** Check alignment of pointer (p) and byte count (c) **/
+#define re_is_aligned(p, c) (((uintptr_t)(const void *)(p)) % (c) == 0)
+
+/** Get the minimal value */
+#undef MIN
+#define MIN(a,b) (((a)<(b)) ? (a) : (b))
+
+/** Get the maximal value */
+#undef MAX
+#define MAX(a,b) (((a)>(b)) ? (a) : (b))
+
+#ifndef __cplusplus
+
+/** Get the minimal value */
+#undef min
+#define min(x,y) MIN(x, y)
+
+/** Get the maximal value */
+#undef max
+#define max(x,y) MAX(x, y)
+
+#endif
+
+/** Defines a soft breakpoint */
+#if (defined(__i386__) || defined(__x86_64__))
+#define DH_BREAKPOINT __asm__("int $0x03")
+#elif defined(__has_builtin)
+#if __has_builtin(__builtin_debugtrap)
+#define DH_BREAKPOINT __builtin_debugtrap()
+#endif
+#endif
+
+#ifndef DH_BREAKPOINT
+#define DH_BREAKPOINT
+#endif
+
+/* Backwards compat */
+#define BREAKPOINT DH_BREAKPOINT
+
+
+/* Error return/goto debug helpers */
+#ifdef TRACE_ERR
+#define PRINT_TRACE_ERR(err)						\
+		(void)re_fprintf(stderr, "TRACE_ERR: %s:%u: %s():"	\
+			      " %m (%d)\n",				\
+			      __FILE__, __LINE__, __func__,		\
+			      (err), (err));
+#else
+#define PRINT_TRACE_ERR(err)
+#endif
+
+#define IF_ERR_GOTO_OUT(err)		\
+	if ((err)) {			\
+		PRINT_TRACE_ERR((err))	\
+		goto out;		\
+	}
+
+#define IF_ERR_GOTO_OUT1(err)		\
+	if ((err)) {			\
+		PRINT_TRACE_ERR((err))	\
+		goto out1;		\
+	}
+
+#define IF_ERR_GOTO_OUT2(err)		\
+	if ((err)) {			\
+		PRINT_TRACE_ERR((err))	\
+		goto out2;		\
+	}
+
+#define IF_ERR_RETURN(err)		\
+	if ((err)) {			\
+		PRINT_TRACE_ERR((err))	\
+		return (err);		\
+	}
+
+#define IF_RETURN_EINVAL(exp)		\
+	if ((exp)) {			\
+		PRINT_TRACE_ERR(EINVAL)	\
+		return (EINVAL);	\
+	}
+
+#define RETURN_ERR(err)			\
+	if ((err)) {			\
+		PRINT_TRACE_ERR((err))	\
+	}				\
+	return (err);
+
+
+/* Error codes */
+#include <errno.h>
+
+/* Duplication of error codes. Values are from linux asm-generic/errno.h */
+
+/** No data available */
+#ifndef ENODATA
+#define ENODATA 200
+#endif
+
+/** Accessing a corrupted shared library */
+#ifndef ELIBBAD
+#define ELIBBAD 204
+#endif
+
+/** Destination address required */
+#ifndef EDESTADDRREQ
+#define EDESTADDRREQ 205
+#endif
+
+/** Protocol not supported */
+#ifndef EPROTONOSUPPORT
+#define EPROTONOSUPPORT 206
+#endif
+
+/** Operation not supported */
+#ifndef ENOTSUP
+#define ENOTSUP 207
+#endif
+
+/** Address family not supported by protocol */
+#ifndef EAFNOSUPPORT
+#define EAFNOSUPPORT 208
+#endif
+
+/** Cannot assign requested address */
+#ifndef EADDRNOTAVAIL
+#define EADDRNOTAVAIL 209
+#endif
+
+/** Software caused connection abort */
+#ifndef ECONNABORTED
+#define ECONNABORTED 210
+#endif
+
+/** Connection reset by peer */
+#ifndef ECONNRESET
+#define ECONNRESET 211
+#endif
+
+/** Transport endpoint is not connected */
+#ifndef ENOTCONN
+#define ENOTCONN 212
+#endif
+
+/** Connection timed out */
+#ifndef ETIMEDOUT
+#define ETIMEDOUT 213
+#endif
+
+/** Connection refused */
+#ifndef ECONNREFUSED
+#define ECONNREFUSED 214
+#endif
+
+/** Operation already in progress */
+#ifndef EALREADY
+#define EALREADY 215
+#endif
+
+/** Operation now in progress */
+#ifndef EINPROGRESS
+#define EINPROGRESS 216
+#endif
+
+/** Authentication error */
+#ifndef EAUTH
+#define EAUTH 217
+#endif
+
+/** No STREAM resources */
+#ifndef ENOSR
+#define ENOSR 218
+#endif
+
+/** Key was rejected by service */
+#ifndef EKEYREJECTED
+#define EKEYREJECTED 129
+#endif
+
+/* Cannot send after transport endpoint shutdown */
+#ifndef ESHUTDOWN
+#define ESHUTDOWN 108
+#endif
+
+/*
+ * Give the compiler a hint which branch is "likely" or "unlikely" (inspired
+ * by linux kernel and C++20/C2X)
+ */
+#ifdef __GNUC__
+#define likely(x)       __builtin_expect(!!(x), 1)
+#define unlikely(x)     __builtin_expect(!!(x), 0)
+#else
+#define likely(x) x
+#define unlikely(x) x
+#endif
+
+#ifdef WIN32
+#define re_restrict __restrict
+#else
+#define re_restrict restrict
+#endif
+
+/* Socket helpers */
+#ifdef WIN32
+#define DH_ERRNO_SOCK WSAGetLastError()
+#define DH_BAD_SOCK INVALID_SOCKET
+typedef size_t re_sock_t;
+#else
+#define DH_ERRNO_SOCK errno
+#define DH_BAD_SOCK -1
+typedef int re_sock_t;
+#endif
+
+
+/* re_assert helpers */
+
+/**
+ * @def re_assert(expr)
+ *
+ * If expression is false, prints error and calls abort() (not in
+ * RELEASE/NDEBUG builds)
+ *
+ * @param expr   expression
+ */
+
+
+/**
+ * @def re_assert_se(expr)
+ *
+ * If expression is false, prints error and calls abort(),
+ * in RELEASE/NDEBUG builds expression is always executed and keeps side effect
+ *
+ * @param expr   expression
+ */
+
+#if defined(RELEASE) || defined(NDEBUG)
+#define re_assert(expr) (void)0
+#define re_assert_se(expr) do{(void)(expr);} while(false)
+#else
+#define re_assert(expr) assert(expr)
+#define re_assert_se(expr) assert(expr)
+#endif
+
+
+/* DH_VA_ARG SIZE helpers */
+#if !defined(DISABLE_DH_ARG) &&                                               \
+	!defined(__STRICT_ANSI__) && /* Needs ## trailing comma fix, with C23 \
+					we can use __VA_OPT__ */              \
+	__STDC_VERSION__ >= 201112L  /* _Generic C11 support required */
+
+#define HAVE_DH_ARG 1
+
+#define DH_ARG_SIZE(type)                                                     \
+	_Generic((0)?(type):(type),                                           \
+	bool:			sizeof(int),                                  \
+	char:			sizeof(int),                                  \
+	unsigned char:		sizeof(unsigned int),                         \
+	short:			sizeof(int),                                  \
+	unsigned short:		sizeof(unsigned int),	                      \
+	int:			sizeof(int),                                  \
+	unsigned int:		sizeof(unsigned int),                         \
+	long:			sizeof(long),                                 \
+	unsigned long:		sizeof(unsigned long),                        \
+	long long:		sizeof(long long),                            \
+	unsigned long long:	sizeof(unsigned long long),                   \
+	float:			sizeof(double),                               \
+	double:			sizeof(double),                               \
+	char const*:		sizeof(char const *),                         \
+	char*:			sizeof(char *),                               \
+	void const*:		sizeof(void const *),                         \
+	void*:			sizeof(void *),                               \
+	struct pl:		sizeof(struct pl),                            \
+	default: sizeof(void*)                                                \
+)
+
+#define DH_ARG_0() 0
+#define DH_ARG_1(expr) DH_ARG_SIZE(expr), (expr), 0
+#define DH_ARG_2(expr, ...) DH_ARG_SIZE(expr), (expr), DH_ARG_1(__VA_ARGS__)
+#define DH_ARG_3(expr, ...) DH_ARG_SIZE(expr), (expr), DH_ARG_2(__VA_ARGS__)
+#define DH_ARG_4(expr, ...) DH_ARG_SIZE(expr), (expr), DH_ARG_3(__VA_ARGS__)
+#define DH_ARG_5(expr, ...) DH_ARG_SIZE(expr), (expr), DH_ARG_4(__VA_ARGS__)
+#define DH_ARG_6(expr, ...) DH_ARG_SIZE(expr), (expr), DH_ARG_5(__VA_ARGS__)
+#define DH_ARG_7(expr, ...) DH_ARG_SIZE(expr), (expr), DH_ARG_6(__VA_ARGS__)
+#define DH_ARG_8(expr, ...) DH_ARG_SIZE(expr), (expr), DH_ARG_7(__VA_ARGS__)
+#define DH_ARG_9(expr, ...) DH_ARG_SIZE(expr), (expr), DH_ARG_8(__VA_ARGS__)
+#define DH_ARG_10(expr, ...) DH_ARG_SIZE(expr), (expr), DH_ARG_9(__VA_ARGS__)
+#define DH_ARG_11(expr, ...) DH_ARG_SIZE(expr), (expr), DH_ARG_10(__VA_ARGS__)
+#define DH_ARG_12(expr, ...) DH_ARG_SIZE(expr), (expr), DH_ARG_11(__VA_ARGS__)
+#define DH_ARG_13(expr, ...) DH_ARG_SIZE(expr), (expr), DH_ARG_12(__VA_ARGS__)
+#define DH_ARG_14(expr, ...) DH_ARG_SIZE(expr), (expr), DH_ARG_13(__VA_ARGS__)
+#define DH_ARG_15(expr, ...) DH_ARG_SIZE(expr), (expr), DH_ARG_14(__VA_ARGS__)
+#define DH_ARG_16(expr, ...) DH_ARG_SIZE(expr), (expr), DH_ARG_15(__VA_ARGS__)
+#define DH_ARG_17(expr, ...) DH_ARG_SIZE(expr), (expr), DH_ARG_16(__VA_ARGS__)
+#define DH_ARG_18(expr, ...) DH_ARG_SIZE(expr), (expr), DH_ARG_17(__VA_ARGS__)
+#define DH_ARG_19(expr, ...) DH_ARG_SIZE(expr), (expr), DH_ARG_18(__VA_ARGS__)
+#define DH_ARG_20(expr, ...) DH_ARG_SIZE(expr), (expr), DH_ARG_19(__VA_ARGS__)
+#define DH_ARG_21(expr, ...) DH_ARG_SIZE(expr), (expr), DH_ARG_20(__VA_ARGS__)
+#define DH_ARG_22(expr, ...) DH_ARG_SIZE(expr), (expr), DH_ARG_21(__VA_ARGS__)
+#define DH_ARG_23(expr, ...) DH_ARG_SIZE(expr), (expr), DH_ARG_22(__VA_ARGS__)
+#define DH_ARG_24(expr, ...) DH_ARG_SIZE(expr), (expr), DH_ARG_23(__VA_ARGS__)
+#define DH_ARG_25(expr, ...) DH_ARG_SIZE(expr), (expr), DH_ARG_24(__VA_ARGS__)
+#define DH_ARG_26(expr, ...) DH_ARG_SIZE(expr), (expr), DH_ARG_25(__VA_ARGS__)
+#define DH_ARG_27(expr, ...) DH_ARG_SIZE(expr), (expr), DH_ARG_26(__VA_ARGS__)
+#define DH_ARG_28(expr, ...) DH_ARG_SIZE(expr), (expr), DH_ARG_27(__VA_ARGS__)
+#define DH_ARG_29(expr, ...) DH_ARG_SIZE(expr), (expr), DH_ARG_28(__VA_ARGS__)
+#define DH_ARG_30(expr, ...) DH_ARG_SIZE(expr), (expr), DH_ARG_29(__VA_ARGS__)
+#define DH_ARG_31(expr, ...) DH_ARG_SIZE(expr), (expr), DH_ARG_30(__VA_ARGS__)
+#define DH_ARG_32(expr, ...) DH_ARG_SIZE(expr), (expr), DH_ARG_31(__VA_ARGS__)
+
+#define DH_ARG_VA_NUM_2(X, X32, X31, X30, X29, X28, X27, X26, X25, X24, X23,  \
+			X22, X21, X20, X19, X18, X17, X16, X15, X14, X13,     \
+			X12, X11, X10, X9, X8, X7, X6, X5, X4, X3, X2, X1, N, \
+			...)                                                  \
+	N
+#define DH_ARG_VA_NUM(...)                                                    \
+	DH_ARG_VA_NUM_2(0, ##__VA_ARGS__, 32, 31, 30, 29, 28, 27, 26, 25, 24, \
+			23, 22, 21, 20, 19, 18, 17, 16, 15, 14, 13, 12, 11,   \
+			10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0)
+
+#define DH_ARG_N3(N, ...) DH_ARG_##N(__VA_ARGS__)
+#define DH_ARG_N2(N, ...) DH_ARG_N3(N, __VA_ARGS__)
+#define DH_VA_ARGS(...) DH_ARG_N2(DH_ARG_VA_NUM(__VA_ARGS__), __VA_ARGS__)
+#endif /* End DH_VA_ARG SIZE helpers */
+
+#define DH_VA_ARG(ap, val, type, safe)                                        \
+	if (likely((safe))) {                                                 \
+		size_t sz = va_arg((ap), size_t);                             \
+		if (unlikely(!sz)) {                                          \
+			err = ENODATA;                                        \
+			goto out;                                             \
+		}                                                             \
+		if (unlikely(sz != sizeof(type))) {                           \
+			err = EOVERFLOW;                                      \
+			goto out;                                             \
+		}                                                             \
+	}                                                                     \
+	(val) = va_arg((ap), type)
--- a/duix-sdk/src/main/cpp/dhcore/lightweightsemaphore.h
+++ b/duix-sdk/src/main/cpp/dhcore/lightweightsemaphore.h
@ -0,0 +1,427 @@
+// Provides an efficient implementation of a semaphore (LightweightSemaphore).
+// This is an extension of Jeff Preshing's sempahore implementation (licensed 
+// under the terms of its separate zlib license) that has been adapted and
+// extended by Cameron Desrochers.
+
+#pragma once
+
+#include <cstddef> // For std::size_t
+#include <atomic>
+#include <type_traits> // For std::make_signed<T>
+
+#if defined(_WIN32)
+// Avoid including windows.h in a header; we only need a handful of
+// items, so we'll redeclare them here (this is relatively safe since
+// the API generally has to remain stable between Windows versions).
+// I know this is an ugly hack but it still beats polluting the global
+// namespace with thousands of generic names or adding a .cpp for nothing.
+extern "C" {
+	struct _SECURITY_ATTRIBUTES;
+	__declspec(dllimport) void* __stdcall CreateSemaphoreW(_SECURITY_ATTRIBUTES* lpSemaphoreAttributes, long lInitialCount, long lMaximumCount, const wchar_t* lpName);
+	__declspec(dllimport) int __stdcall CloseHandle(void* hObject);
+	__declspec(dllimport) unsigned long __stdcall WaitForSingleObject(void* hHandle, unsigned long dwMilliseconds);
+	__declspec(dllimport) int __stdcall ReleaseSemaphore(void* hSemaphore, long lReleaseCount, long* lpPreviousCount);
+}
+#elif defined(__MACH__)
+#include <mach/mach.h>
+#elif defined(__MVS__)
+#include <zos-semaphore.h>
+#elif defined(__unix__)
+#include <semaphore.h>
+
+#if defined(__GLIBC_PREREQ) && defined(_GNU_SOURCE)
+#if __GLIBC_PREREQ(2,30)
+#define MOODYCAMEL_LIGHTWEIGHTSEMAPHORE_MONOTONIC
+#endif
+#endif
+#endif
+
+namespace moodycamel
+{
+namespace details
+{
+
+// Code in the mpmc_sema namespace below is an adaptation of Jeff Preshing's
+// portable + lightweight semaphore implementations, originally from
+// https://github.com/preshing/cpp11-on-multicore/blob/master/common/sema.h
+// LICENSE:
+// Copyright (c) 2015 Jeff Preshing
+//
+// This software is provided 'as-is', without any express or implied
+// warranty. In no event will the authors be held liable for any damages
+// arising from the use of this software.
+//
+// Permission is granted to anyone to use this software for any purpose,
+// including commercial applications, and to alter it and redistribute it
+// freely, subject to the following restrictions:
+//
+// 1. The origin of this software must not be misrepresented; you must not
+//	claim that you wrote the original software. If you use this software
+//	in a product, an acknowledgement in the product documentation would be
+//	appreciated but is not required.
+// 2. Altered source versions must be plainly marked as such, and must not be
+//	misrepresented as being the original software.
+// 3. This notice may not be removed or altered from any source distribution.
+#if defined(_WIN32)
+class Semaphore
+{
+private:
+	void* m_hSema;
+	
+	Semaphore(const Semaphore& other) MOODYCAMEL_DELETE_FUNCTION;
+	Semaphore& operator=(const Semaphore& other) MOODYCAMEL_DELETE_FUNCTION;
+
+public:
+	Semaphore(int initialCount = 0)
+	{
+		assert(initialCount >= 0);
+		const long maxLong = 0x7fffffff;
+		m_hSema = CreateSemaphoreW(nullptr, initialCount, maxLong, nullptr);
+		assert(m_hSema);
+	}
+
+	~Semaphore()
+	{
+		CloseHandle(m_hSema);
+	}
+
+	bool wait()
+	{
+		const unsigned long infinite = 0xffffffff;
+		return WaitForSingleObject(m_hSema, infinite) == 0;
+	}
+	
+	bool try_wait()
+	{
+		return WaitForSingleObject(m_hSema, 0) == 0;
+	}
+	
+	bool timed_wait(std::uint64_t usecs)
+	{
+		return WaitForSingleObject(m_hSema, (unsigned long)(usecs / 1000)) == 0;
+	}
+
+	void signal(int count = 1)
+	{
+		while (!ReleaseSemaphore(m_hSema, count, nullptr));
+	}
+};
+#elif defined(__MACH__)
+//---------------------------------------------------------
+// Semaphore (Apple iOS and OSX)
+// Can't use POSIX semaphores due to http://lists.apple.com/archives/darwin-kernel/2009/Apr/msg00010.html
+//---------------------------------------------------------
+class Semaphore
+{
+private:
+	semaphore_t m_sema;
+
+	Semaphore(const Semaphore& other) MOODYCAMEL_DELETE_FUNCTION;
+	Semaphore& operator=(const Semaphore& other) MOODYCAMEL_DELETE_FUNCTION;
+
+public:
+	Semaphore(int initialCount = 0)
+	{
+		assert(initialCount >= 0);
+		kern_return_t rc = semaphore_create(mach_task_self(), &m_sema, SYNC_POLICY_FIFO, initialCount);
+		assert(rc == KERN_SUCCESS);
+		(void)rc;
+	}
+
+	~Semaphore()
+	{
+		semaphore_destroy(mach_task_self(), m_sema);
+	}
+
+	bool wait()
+	{
+		return semaphore_wait(m_sema) == KERN_SUCCESS;
+	}
+	
+	bool try_wait()
+	{
+		return timed_wait(0);
+	}
+	
+	bool timed_wait(std::uint64_t timeout_usecs)
+	{
+		mach_timespec_t ts;
+		ts.tv_sec = static_cast<unsigned int>(timeout_usecs / 1000000);
+		ts.tv_nsec = static_cast<int>((timeout_usecs % 1000000) * 1000);
+
+		// added in OSX 10.10: https://developer.apple.com/library/prerelease/mac/documentation/General/Reference/APIDiffsMacOSX10_10SeedDiff/modules/Darwin.html
+		kern_return_t rc = semaphore_timedwait(m_sema, ts);
+		return rc == KERN_SUCCESS;
+	}
+
+	void signal()
+	{
+		while (semaphore_signal(m_sema) != KERN_SUCCESS);
+	}
+
+	void signal(int count)
+	{
+		while (count-- > 0)
+		{
+			while (semaphore_signal(m_sema) != KERN_SUCCESS);
+		}
+	}
+};
+#elif defined(__unix__) || defined(__MVS__)
+//---------------------------------------------------------
+// Semaphore (POSIX, Linux, zOS)
+//---------------------------------------------------------
+class Semaphore
+{
+private:
+	sem_t m_sema;
+
+	Semaphore(const Semaphore& other) MOODYCAMEL_DELETE_FUNCTION;
+	Semaphore& operator=(const Semaphore& other) MOODYCAMEL_DELETE_FUNCTION;
+
+public:
+	Semaphore(int initialCount = 0)
+	{
+		assert(initialCount >= 0);
+		int rc = sem_init(&m_sema, 0, static_cast<unsigned int>(initialCount));
+		assert(rc == 0);
+		(void)rc;
+	}
+
+	~Semaphore()
+	{
+		sem_destroy(&m_sema);
+	}
+
+	bool wait()
+	{
+		// http://stackoverflow.com/questions/2013181/gdb-causes-sem-wait-to-fail-with-eintr-error
+		int rc;
+		do {
+			rc = sem_wait(&m_sema);
+		} while (rc == -1 && errno == EINTR);
+		return rc == 0;
+	}
+
+	bool try_wait()
+	{
+		int rc;
+		do {
+			rc = sem_trywait(&m_sema);
+		} while (rc == -1 && errno == EINTR);
+		return rc == 0;
+	}
+
+	bool timed_wait(std::uint64_t usecs)
+	{
+		struct timespec ts;
+		const int usecs_in_1_sec = 1000000;
+		const int nsecs_in_1_sec = 1000000000;
+#ifdef MOODYCAMEL_LIGHTWEIGHTSEMAPHORE_MONOTONIC
+		clock_gettime(CLOCK_MONOTONIC, &ts);
+#else
+		clock_gettime(CLOCK_REALTIME, &ts);
+#endif
+		ts.tv_sec += (time_t)(usecs / usecs_in_1_sec);
+		ts.tv_nsec += (long)(usecs % usecs_in_1_sec) * 1000;
+		// sem_timedwait bombs if you have more than 1e9 in tv_nsec
+		// so we have to clean things up before passing it in
+		if (ts.tv_nsec >= nsecs_in_1_sec) {
+			ts.tv_nsec -= nsecs_in_1_sec;
+			++ts.tv_sec;
+		}
+
+		int rc;
+		do {
+#ifdef MOODYCAMEL_LIGHTWEIGHTSEMAPHORE_MONOTONIC
+			rc = sem_clockwait(&m_sema, CLOCK_MONOTONIC, &ts);
+#else
+			rc = sem_timedwait(&m_sema, &ts);
+#endif
+		} while (rc == -1 && errno == EINTR);
+		return rc == 0;
+	}
+
+	void signal()
+	{
+		while (sem_post(&m_sema) == -1);
+	}
+
+	void signal(int count)
+	{
+		while (count-- > 0)
+		{
+			while (sem_post(&m_sema) == -1);
+		}
+	}
+};
+#else
+#error Unsupported platform! (No semaphore wrapper available)
+#endif
+
+}	// end namespace details
+
+
+//---------------------------------------------------------
+// LightweightSemaphore
+//---------------------------------------------------------
+class LightweightSemaphore
+{
+public:
+	typedef std::make_signed<std::size_t>::type ssize_t;
+
+private:
+	std::atomic<ssize_t> m_count;
+	details::Semaphore m_sema;
+	int m_maxSpins;
+
+	bool waitWithPartialSpinning(std::int64_t timeout_usecs = -1)
+	{
+		ssize_t oldCount;
+		int spin = m_maxSpins;
+		while (--spin >= 0)
+		{
+			oldCount = m_count.load(std::memory_order_relaxed);
+			if ((oldCount > 0) && m_count.compare_exchange_strong(oldCount, oldCount - 1, std::memory_order_acquire, std::memory_order_relaxed))
+				return true;
+			std::atomic_signal_fence(std::memory_order_acquire);	 // Prevent the compiler from collapsing the loop.
+		}
+		oldCount = m_count.fetch_sub(1, std::memory_order_acquire);
+		if (oldCount > 0)
+			return true;
+		if (timeout_usecs < 0)
+		{
+			if (m_sema.wait())
+				return true;
+		}
+		if (timeout_usecs > 0 && m_sema.timed_wait((std::uint64_t)timeout_usecs))
+			return true;
+		// At this point, we've timed out waiting for the semaphore, but the
+		// count is still decremented indicating we may still be waiting on
+		// it. So we have to re-adjust the count, but only if the semaphore
+		// wasn't signaled enough times for us too since then. If it was, we
+		// need to release the semaphore too.
+		while (true)
+		{
+			oldCount = m_count.load(std::memory_order_acquire);
+			if (oldCount >= 0 && m_sema.try_wait())
+				return true;
+			if (oldCount < 0 && m_count.compare_exchange_strong(oldCount, oldCount + 1, std::memory_order_relaxed, std::memory_order_relaxed))
+				return false;
+		}
+	}
+
+	ssize_t waitManyWithPartialSpinning(ssize_t max, std::int64_t timeout_usecs = -1)
+	{
+		assert(max > 0);
+		ssize_t oldCount;
+		int spin = m_maxSpins;
+		while (--spin >= 0)
+		{
+			oldCount = m_count.load(std::memory_order_relaxed);
+			if (oldCount > 0)
+			{
+				ssize_t newCount = oldCount > max ? oldCount - max : 0;
+				if (m_count.compare_exchange_strong(oldCount, newCount, std::memory_order_acquire, std::memory_order_relaxed))
+					return oldCount - newCount;
+			}
+			std::atomic_signal_fence(std::memory_order_acquire);
+		}
+		oldCount = m_count.fetch_sub(1, std::memory_order_acquire);
+		if (oldCount <= 0)
+		{
+			if ((timeout_usecs == 0) || (timeout_usecs < 0 && !m_sema.wait()) || (timeout_usecs > 0 && !m_sema.timed_wait((std::uint64_t)timeout_usecs)))
+			{
+				while (true)
+				{
+					oldCount = m_count.load(std::memory_order_acquire);
+					if (oldCount >= 0 && m_sema.try_wait())
+						break;
+					if (oldCount < 0 && m_count.compare_exchange_strong(oldCount, oldCount + 1, std::memory_order_relaxed, std::memory_order_relaxed))
+						return 0;
+				}
+			}
+		}
+		if (max > 1)
+			return 1 + tryWaitMany(max - 1);
+		return 1;
+	}
+
+public:
+	LightweightSemaphore(ssize_t initialCount = 0, int maxSpins = 10000) : m_count(initialCount), m_maxSpins(maxSpins)
+	{
+		assert(initialCount >= 0);
+		assert(maxSpins >= 0);
+	}
+
+	bool tryWait()
+	{
+		ssize_t oldCount = m_count.load(std::memory_order_relaxed);
+		while (oldCount > 0)
+		{
+			if (m_count.compare_exchange_weak(oldCount, oldCount - 1, std::memory_order_acquire, std::memory_order_relaxed))
+				return true;
+		}
+		return false;
+	}
+
+	bool wait()
+	{
+		return tryWait() || waitWithPartialSpinning();
+	}
+
+	bool wait(std::int64_t timeout_usecs)
+	{
+		return tryWait() || waitWithPartialSpinning(timeout_usecs);
+	}
+
+	// Acquires between 0 and (greedily) max, inclusive
+	ssize_t tryWaitMany(ssize_t max)
+	{
+		assert(max >= 0);
+		ssize_t oldCount = m_count.load(std::memory_order_relaxed);
+		while (oldCount > 0)
+		{
+			ssize_t newCount = oldCount > max ? oldCount - max : 0;
+			if (m_count.compare_exchange_weak(oldCount, newCount, std::memory_order_acquire, std::memory_order_relaxed))
+				return oldCount - newCount;
+		}
+		return 0;
+	}
+
+	// Acquires at least one, and (greedily) at most max
+	ssize_t waitMany(ssize_t max, std::int64_t timeout_usecs)
+	{
+		assert(max >= 0);
+		ssize_t result = tryWaitMany(max);
+		if (result == 0 && max > 0)
+			result = waitManyWithPartialSpinning(max, timeout_usecs);
+		return result;
+	}
+	
+	ssize_t waitMany(ssize_t max)
+	{
+		ssize_t result = waitMany(max, -1);
+		assert(result > 0);
+		return result;
+	}
+
+	void signal(ssize_t count = 1)
+	{
+		assert(count >= 0);
+		ssize_t oldCount = m_count.fetch_add(count, std::memory_order_release);
+		ssize_t toRelease = -oldCount < count ? -oldCount : count;
+		if (toRelease > 0)
+		{
+			m_sema.signal((int)toRelease);
+		}
+	}
+	
+	std::size_t availableApprox() const
+	{
+		ssize_t count = m_count.load(std::memory_order_relaxed);
+		return count > 0 ? static_cast<std::size_t>(count) : 0;
+	}
+};
+
+}   // end namespace moodycamel
--- a/duix-sdk/src/main/cpp/dhcore/readerwritercircularbuffer.h
+++ b/duix-sdk/src/main/cpp/dhcore/readerwritercircularbuffer.h
@ -0,0 +1,321 @@
+// ©2020 Cameron Desrochers.
+// Distributed under the simplified BSD license (see the license file that
+// should have come with this header).
+
+// Provides a C++11 implementation of a single-producer, single-consumer wait-free concurrent
+// circular buffer (fixed-size queue).
+
+#pragma once
+
+#include <utility>
+#include <chrono>
+#include <memory>
+#include <cstdlib>
+#include <cstdint>
+#include <cassert>
+
+// Note that this implementation is fully modern C++11 (not compatible with old MSVC versions)
+// but we still include atomicops.h for its LightweightSemaphore implementation.
+#include "atomicops.h"
+
+#ifndef MOODYCAMEL_CACHE_LINE_SIZE
+#define MOODYCAMEL_CACHE_LINE_SIZE 64
+#endif
+
+namespace moodycamel {
+
+template<typename T>
+class BlockingReaderWriterCircularBuffer
+{
+public:
+	typedef T value_type;
+
+public:
+	explicit BlockingReaderWriterCircularBuffer(std::size_t capacity)
+		: maxcap(capacity), mask(), rawData(), data(),
+		slots_(new spsc_sema::LightweightSemaphore(static_cast<spsc_sema::LightweightSemaphore::ssize_t>(capacity))),
+		items(new spsc_sema::LightweightSemaphore(0)),
+		nextSlot(0), nextItem(0)
+	{
+		// Round capacity up to power of two to compute modulo mask.
+		// Adapted from http://graphics.stanford.edu/~seander/bithacks.html#RoundUpPowerOf2
+		--capacity;
+		capacity |= capacity >> 1;
+		capacity |= capacity >> 2;
+		capacity |= capacity >> 4;
+		for (std::size_t i = 1; i < sizeof(std::size_t); i <<= 1)
+			capacity |= capacity >> (i << 3);
+		mask = capacity++;
+		rawData = static_cast<char*>(std::malloc(capacity * sizeof(T) + std::alignment_of<T>::value - 1));
+		data = align_for<T>(rawData);
+	}
+
+	BlockingReaderWriterCircularBuffer(BlockingReaderWriterCircularBuffer&& other)
+		: maxcap(0), mask(0), rawData(nullptr), data(nullptr),
+		slots_(new spsc_sema::LightweightSemaphore(0)),
+		items(new spsc_sema::LightweightSemaphore(0)),
+		nextSlot(), nextItem()
+	{
+		swap(other);
+	}
+
+	BlockingReaderWriterCircularBuffer(BlockingReaderWriterCircularBuffer const&) = delete;
+
+	// Note: The queue should not be accessed concurrently while it's
+	// being deleted. It's up to the user to synchronize this.
+	~BlockingReaderWriterCircularBuffer()
+	{
+		for (std::size_t i = 0, n = items->availableApprox(); i != n; ++i)
+			reinterpret_cast<T*>(data)[(nextItem + i) & mask].~T();
+		std::free(rawData);
+	}
+
+	BlockingReaderWriterCircularBuffer& operator=(BlockingReaderWriterCircularBuffer&& other) noexcept
+	{
+		swap(other);
+		return *this;
+	}
+
+	BlockingReaderWriterCircularBuffer& operator=(BlockingReaderWriterCircularBuffer const&) = delete;
+
+	// Swaps the contents of this buffer with the contents of another.
+	// Not thread-safe.
+	void swap(BlockingReaderWriterCircularBuffer& other) noexcept
+	{
+		std::swap(maxcap, other.maxcap);
+		std::swap(mask, other.mask);
+		std::swap(rawData, other.rawData);
+		std::swap(data, other.data);
+		std::swap(slots_, other.slots_);
+		std::swap(items, other.items);
+		std::swap(nextSlot, other.nextSlot);
+		std::swap(nextItem, other.nextItem);
+	}
+
+	// Enqueues a single item (by copying it).
+	// Fails if not enough room to enqueue.
+	// Thread-safe when called by producer thread.
+	// No exception guarantee (state will be corrupted) if constructor of T throws.
+	bool try_enqueue(T const& item)
+	{
+		if (!slots_->tryWait())
+			return false;
+		inner_enqueue(item);
+		return true;
+	}
+
+	// Enqueues a single item (by moving it, if possible).
+	// Fails if not enough room to enqueue.
+	// Thread-safe when called by producer thread.
+	// No exception guarantee (state will be corrupted) if constructor of T throws.
+	bool try_enqueue(T&& item)
+	{
+		if (!slots_->tryWait())
+			return false;
+		inner_enqueue(std::move(item));
+		return true;
+	}
+
+	// Blocks the current thread until there's enough space to enqueue the given item,
+	// then enqueues it (via copy).
+	// Thread-safe when called by producer thread.
+	// No exception guarantee (state will be corrupted) if constructor of T throws.
+	void wait_enqueue(T const& item)
+	{
+		while (!slots_->wait());
+		inner_enqueue(item);
+	}
+
+	// Blocks the current thread until there's enough space to enqueue the given item,
+	// then enqueues it (via move, if possible).
+	// Thread-safe when called by producer thread.
+	// No exception guarantee (state will be corrupted) if constructor of T throws.
+	void wait_enqueue(T&& item)
+	{
+		while (!slots_->wait());
+		inner_enqueue(std::move(item));
+	}
+
+	// Blocks the current thread until there's enough space to enqueue the given item,
+	// or the timeout expires. Returns false without enqueueing the item if the timeout
+	// expires, otherwise enqueues the item (via copy) and returns true.
+	// Thread-safe when called by producer thread.
+	// No exception guarantee (state will be corrupted) if constructor of T throws.
+	bool wait_enqueue_timed(T const& item, std::int64_t timeout_usecs)
+	{
+		if (!slots_->wait(timeout_usecs))
+			return false;
+		inner_enqueue(item);
+		return true;
+	}
+
+	// Blocks the current thread until there's enough space to enqueue the given item,
+	// or the timeout expires. Returns false without enqueueing the item if the timeout
+	// expires, otherwise enqueues the item (via move, if possible) and returns true.
+	// Thread-safe when called by producer thread.
+	// No exception guarantee (state will be corrupted) if constructor of T throws.
+	bool wait_enqueue_timed(T&& item, std::int64_t timeout_usecs)
+	{
+		if (!slots_->wait(timeout_usecs))
+			return false;
+		inner_enqueue(std::move(item));
+		return true;
+	}
+
+	// Blocks the current thread until there's enough space to enqueue the given item,
+	// or the timeout expires. Returns false without enqueueing the item if the timeout
+	// expires, otherwise enqueues the item (via copy) and returns true.
+	// Thread-safe when called by producer thread.
+	// No exception guarantee (state will be corrupted) if constructor of T throws.
+	template<typename Rep, typename Period>
+	inline bool wait_enqueue_timed(T const& item, std::chrono::duration<Rep, Period> const& timeout)
+	{
+		return wait_enqueue_timed(item, std::chrono::duration_cast<std::chrono::microseconds>(timeout).count());
+	}
+
+	// Blocks the current thread until there's enough space to enqueue the given item,
+	// or the timeout expires. Returns false without enqueueing the item if the timeout
+	// expires, otherwise enqueues the item (via move, if possible) and returns true.
+	// Thread-safe when called by producer thread.
+	// No exception guarantee (state will be corrupted) if constructor of T throws.
+	template<typename Rep, typename Period>
+	inline bool wait_enqueue_timed(T&& item, std::chrono::duration<Rep, Period> const& timeout)
+	{
+		return wait_enqueue_timed(std::move(item), std::chrono::duration_cast<std::chrono::microseconds>(timeout).count());
+	}
+
+	// Attempts to dequeue a single item.
+	// Returns false if the buffer is empty.
+	// Thread-safe when called by consumer thread.
+	// No exception guarantee (state will be corrupted) if assignment operator of U throws.
+	template<typename U>
+	bool try_dequeue(U& item)
+	{
+		if (!items->tryWait())
+			return false;
+		inner_dequeue(item);
+		return true;
+	}
+
+	// Blocks the current thread until there's something to dequeue, then dequeues it.
+	// Thread-safe when called by consumer thread.
+	// No exception guarantee (state will be corrupted) if assignment operator of U throws.
+	template<typename U>
+	void wait_dequeue(U& item)
+	{
+		while (!items->wait());
+		inner_dequeue(item);
+	}
+
+	// Blocks the current thread until either there's something to dequeue
+	// or the timeout expires. Returns false without setting `item` if the
+	// timeout expires, otherwise assigns to `item` and returns true.
+	// Thread-safe when called by consumer thread.
+	// No exception guarantee (state will be corrupted) if assignment operator of U throws.
+	template<typename U>
+	bool wait_dequeue_timed(U& item, std::int64_t timeout_usecs)
+	{
+		if (!items->wait(timeout_usecs))
+			return false;
+		inner_dequeue(item);
+		return true;
+	}
+
+	// Blocks the current thread until either there's something to dequeue
+	// or the timeout expires. Returns false without setting `item` if the
+	// timeout expires, otherwise assigns to `item` and returns true.
+	// Thread-safe when called by consumer thread.
+	// No exception guarantee (state will be corrupted) if assignment operator of U throws.
+	template<typename U, typename Rep, typename Period>
+	inline bool wait_dequeue_timed(U& item, std::chrono::duration<Rep, Period> const& timeout)
+	{
+		return wait_dequeue_timed(item, std::chrono::duration_cast<std::chrono::microseconds>(timeout).count());
+	}
+
+	// Returns a pointer to the next element in the queue (the one that would
+	// be removed next by a call to `try_dequeue` or `try_pop`). If the queue
+	// appears empty at the time the method is called, returns nullptr instead.
+	// Thread-safe when called by consumer thread.
+	inline T* peek()
+	{
+		if (!items->availableApprox())
+			return nullptr;
+		return inner_peek();
+	}
+
+	// Pops the next element from the queue, if there is one.
+	// Thread-safe when called by consumer thread.
+	inline bool try_pop()
+	{
+		if (!items->tryWait())
+			return false;
+		inner_pop();
+		return true;
+	}
+
+	// Returns a (possibly outdated) snapshot of the total number of elements currently in the buffer.
+	// Thread-safe.
+	inline std::size_t size_approx() const
+	{
+		return items->availableApprox();
+	}
+
+	// Returns the maximum number of elements that this circular buffer can hold at once.
+	// Thread-safe.
+	inline std::size_t max_capacity() const
+	{
+		return maxcap;
+	}
+
+private:
+	template<typename U>
+	void inner_enqueue(U&& item)
+	{
+		std::size_t i = nextSlot++;
+		new (reinterpret_cast<T*>(data) + (i & mask)) T(std::forward<U>(item));
+		items->signal();
+	}
+
+	template<typename U>
+	void inner_dequeue(U& item)
+	{
+		std::size_t i = nextItem++;
+		T& element = reinterpret_cast<T*>(data)[i & mask];
+		item = std::move(element);
+		element.~T();
+		slots_->signal();
+	}
+
+	T* inner_peek()
+	{
+		return reinterpret_cast<T*>(data) + (nextItem & mask);
+	}
+
+	void inner_pop()
+	{
+		std::size_t i = nextItem++;
+		reinterpret_cast<T*>(data)[i & mask].~T();
+		slots_->signal();
+	}
+
+	template<typename U>
+	static inline char* align_for(char* ptr)
+	{
+		const std::size_t alignment = std::alignment_of<U>::value;
+		return ptr + (alignment - (reinterpret_cast<std::uintptr_t>(ptr) % alignment)) % alignment;
+	}
+
+private:
+	std::size_t maxcap;                           // actual (non-power-of-two) capacity
+	std::size_t mask;                             // circular buffer capacity mask (for cheap modulo)
+	char* rawData;                                // raw circular buffer memory
+	char* data;                                   // circular buffer memory aligned to element alignment
+	std::unique_ptr<spsc_sema::LightweightSemaphore> slots_;  // number of slots currently free (named with underscore to accommodate Qt's 'slots' macro)
+	std::unique_ptr<spsc_sema::LightweightSemaphore> items;   // number of elements currently enqueued
+	char cachelineFiller0[MOODYCAMEL_CACHE_LINE_SIZE - sizeof(char*) * 2 - sizeof(std::size_t) * 2 - sizeof(std::unique_ptr<spsc_sema::LightweightSemaphore>) * 2];
+	std::size_t nextSlot;                         // index of next free slot to enqueue into
+	char cachelineFiller1[MOODYCAMEL_CACHE_LINE_SIZE - sizeof(std::size_t)];
+	std::size_t nextItem;                         // index of next element to dequeue from
+};
+
+}
--- a/duix-sdk/src/main/cpp/dhcore/readerwriterqueue.h
+++ b/duix-sdk/src/main/cpp/dhcore/readerwriterqueue.h
@ -0,0 +1,979 @@
+// ©2013-2020 Cameron Desrochers.
+// Distributed under the simplified BSD license (see the license file that
+// should have come with this header).
+
+#pragma once
+
+#include "atomicops.h"
+#include <new>
+#include <type_traits>
+#include <utility>
+#include <cassert>
+#include <stdexcept>
+#include <new>
+#include <cstdint>
+#include <cstdlib>		// For malloc/free/abort & size_t
+#include <memory>
+#if __cplusplus > 199711L || _MSC_VER >= 1700 // C++11 or VS2012
+#include <chrono>
+#endif
+
+
+// A lock-free queue for a single-consumer, single-producer architecture.
+// The queue is also wait-free in the common path (except if more memory
+// needs to be allocated, in which case malloc is called).
+// Allocates memory sparingly, and only once if the original maximum size
+// estimate is never exceeded.
+// Tested on x86/x64 processors, but semantics should be correct for all
+// architectures (given the right implementations in atomicops.h), provided
+// that aligned integer and pointer accesses are naturally atomic.
+// Note that there should only be one consumer thread and producer thread;
+// Switching roles of the threads, or using multiple consecutive threads for
+// one role, is not safe unless properly synchronized.
+// Using the queue exclusively from one thread is fine, though a bit silly.
+
+#ifndef MOODYCAMEL_CACHE_LINE_SIZE
+#define MOODYCAMEL_CACHE_LINE_SIZE 64
+#endif
+
+#ifndef MOODYCAMEL_EXCEPTIONS_ENABLED
+#if (defined(_MSC_VER) && defined(_CPPUNWIND)) || (defined(__GNUC__) && defined(__EXCEPTIONS)) || (!defined(_MSC_VER) && !defined(__GNUC__))
+//#define MOODYCAMEL_EXCEPTIONS_ENABLED
+#endif
+#endif
+
+#ifndef MOODYCAMEL_HAS_EMPLACE
+#if !defined(_MSC_VER) || _MSC_VER >= 1800 // variadic templates: either a non-MS compiler or VS >= 2013
+#define MOODYCAMEL_HAS_EMPLACE    1
+#endif
+#endif
+
+#ifndef MOODYCAMEL_MAYBE_ALIGN_TO_CACHELINE
+#if defined (__APPLE__) && defined (__MACH__) && __cplusplus >= 201703L
+// This is required to find out what deployment target we are using
+#include <AvailabilityMacros.h>
+#if !defined(MAC_OS_X_VERSION_MIN_REQUIRED) || !defined(MAC_OS_X_VERSION_10_14) || MAC_OS_X_VERSION_MIN_REQUIRED < MAC_OS_X_VERSION_10_14
+// C++17 new(size_t, align_val_t) is not backwards-compatible with older versions of macOS, so we can't support over-alignment in this case
+#define MOODYCAMEL_MAYBE_ALIGN_TO_CACHELINE
+#endif
+#endif
+#endif
+
+#ifndef MOODYCAMEL_MAYBE_ALIGN_TO_CACHELINE
+#define MOODYCAMEL_MAYBE_ALIGN_TO_CACHELINE AE_ALIGN(MOODYCAMEL_CACHE_LINE_SIZE)
+#endif
+
+#ifdef AE_VCPP
+#pragma warning(push)
+#pragma warning(disable: 4324)	// structure was padded due to __declspec(align())
+#pragma warning(disable: 4820)	// padding was added
+#pragma warning(disable: 4127)	// conditional expression is constant
+#endif
+
+namespace moodycamel {
+
+template<typename T, size_t MAX_BLOCK_SIZE = 512>
+class MOODYCAMEL_MAYBE_ALIGN_TO_CACHELINE ReaderWriterQueue
+{
+	// Design: Based on a queue-of-queues. The low-level queues are just
+	// circular buffers with front and tail indices indicating where the
+	// next element to dequeue is and where the next element can be enqueued,
+	// respectively. Each low-level queue is called a "block". Each block
+	// wastes exactly one element's worth of space to keep the design simple
+	// (if front == tail then the queue is empty, and can't be full).
+	// The high-level queue is a circular linked list of blocks; again there
+	// is a front and tail, but this time they are pointers to the blocks.
+	// The front block is where the next element to be dequeued is, provided
+	// the block is not empty. The back block is where elements are to be
+	// enqueued, provided the block is not full.
+	// The producer thread owns all the tail indices/pointers. The consumer
+	// thread owns all the front indices/pointers. Both threads read each
+	// other's variables, but only the owning thread updates them. E.g. After
+	// the consumer reads the producer's tail, the tail may change before the
+	// consumer is done dequeuing an object, but the consumer knows the tail
+	// will never go backwards, only forwards.
+	// If there is no room to enqueue an object, an additional block (of
+	// equal size to the last block) is added. Blocks are never removed.
+
+public:
+	typedef T value_type;
+
+	// Constructs a queue that can hold at least `size` elements without further
+	// allocations. If more than MAX_BLOCK_SIZE elements are requested,
+	// then several blocks of MAX_BLOCK_SIZE each are reserved (including
+	// at least one extra buffer block).
+	AE_NO_TSAN explicit ReaderWriterQueue(size_t size = 15)
+#ifndef NDEBUG
+		: enqueuing(false)
+		,dequeuing(false)
+#endif
+	{
+		assert(MAX_BLOCK_SIZE == ceilToPow2(MAX_BLOCK_SIZE) && "MAX_BLOCK_SIZE must be a power of 2");
+		assert(MAX_BLOCK_SIZE >= 2 && "MAX_BLOCK_SIZE must be at least 2");
+		
+		Block* firstBlock = nullptr;
+		
+		largestBlockSize = ceilToPow2(size + 1);		// We need a spare slot to fit size elements in the block
+		if (largestBlockSize > MAX_BLOCK_SIZE * 2) {
+			// We need a spare block in case the producer is writing to a different block the consumer is reading from, and
+			// wants to enqueue the maximum number of elements. We also need a spare element in each block to avoid the ambiguity
+			// between front == tail meaning "empty" and "full".
+			// So the effective number of slots that are guaranteed to be usable at any time is the block size - 1 times the
+			// number of blocks - 1. Solving for size and applying a ceiling to the division gives us (after simplifying):
+			size_t initialBlockCount = (size + MAX_BLOCK_SIZE * 2 - 3) / (MAX_BLOCK_SIZE - 1);
+			largestBlockSize = MAX_BLOCK_SIZE;
+			Block* lastBlock = nullptr;
+			for (size_t i = 0; i != initialBlockCount; ++i) {
+				auto block = make_block(largestBlockSize);
+				if (block == nullptr) {
+#ifdef MOODYCAMEL_EXCEPTIONS_ENABLED
+					throw std::bad_alloc();
+#else
+					abort();
+#endif
+				}
+				if (firstBlock == nullptr) {
+					firstBlock = block;
+				}
+				else {
+					lastBlock->next = block;
+				}
+				lastBlock = block;
+				block->next = firstBlock;
+			}
+		}
+		else {
+			firstBlock = make_block(largestBlockSize);
+			if (firstBlock == nullptr) {
+#ifdef MOODYCAMEL_EXCEPTIONS_ENABLED
+				throw std::bad_alloc();
+#else
+				abort();
+#endif
+			}
+			firstBlock->next = firstBlock;
+		}
+		frontBlock = firstBlock;
+		tailBlock = firstBlock;
+		
+		// Make sure the reader/writer threads will have the initialized memory setup above:
+		fence(memory_order_sync);
+	}
+
+	// Note: The queue should not be accessed concurrently while it's
+	// being moved. It's up to the user to synchronize this.
+	AE_NO_TSAN ReaderWriterQueue(ReaderWriterQueue&& other)
+		: frontBlock(other.frontBlock.load()),
+		tailBlock(other.tailBlock.load()),
+		largestBlockSize(other.largestBlockSize)
+#ifndef NDEBUG
+		,enqueuing(false)
+		,dequeuing(false)
+#endif
+	{
+		other.largestBlockSize = 32;
+		Block* b = other.make_block(other.largestBlockSize);
+		if (b == nullptr) {
+#ifdef MOODYCAMEL_EXCEPTIONS_ENABLED
+			throw std::bad_alloc();
+#else
+			abort();
+#endif
+		}
+		b->next = b;
+		other.frontBlock = b;
+		other.tailBlock = b;
+	}
+
+	// Note: The queue should not be accessed concurrently while it's
+	// being moved. It's up to the user to synchronize this.
+	ReaderWriterQueue& operator=(ReaderWriterQueue&& other) AE_NO_TSAN
+	{
+		Block* b = frontBlock.load();
+		frontBlock = other.frontBlock.load();
+		other.frontBlock = b;
+		b = tailBlock.load();
+		tailBlock = other.tailBlock.load();
+		other.tailBlock = b;
+		std::swap(largestBlockSize, other.largestBlockSize);
+		return *this;
+	}
+
+	// Note: The queue should not be accessed concurrently while it's
+	// being deleted. It's up to the user to synchronize this.
+	AE_NO_TSAN ~ReaderWriterQueue()
+	{
+		// Make sure we get the latest version of all variables from other CPUs:
+		fence(memory_order_sync);
+
+		// Destroy any remaining objects in queue and free memory
+		Block* frontBlock_ = frontBlock;
+		Block* block = frontBlock_;
+		do {
+			Block* nextBlock = block->next;
+			size_t blockFront = block->front;
+			size_t blockTail = block->tail;
+
+			for (size_t i = blockFront; i != blockTail; i = (i + 1) & block->sizeMask) {
+				auto element = reinterpret_cast<T*>(block->data + i * sizeof(T));
+				element->~T();
+				(void)element;
+			}
+			
+			auto rawBlock = block->rawThis;
+			block->~Block();
+			std::free(rawBlock);
+			block = nextBlock;
+		} while (block != frontBlock_);
+	}
+
+
+	// Enqueues a copy of element if there is room in the queue.
+	// Returns true if the element was enqueued, false otherwise.
+	// Does not allocate memory.
+	AE_FORCEINLINE bool try_enqueue(T const& element) AE_NO_TSAN
+	{
+		return inner_enqueue<CannotAlloc>(element);
+	}
+
+	// Enqueues a moved copy of element if there is room in the queue.
+	// Returns true if the element was enqueued, false otherwise.
+	// Does not allocate memory.
+	AE_FORCEINLINE bool try_enqueue(T&& element) AE_NO_TSAN
+	{
+		return inner_enqueue<CannotAlloc>(std::forward<T>(element));
+	}
+
+#if MOODYCAMEL_HAS_EMPLACE
+	// Like try_enqueue() but with emplace semantics (i.e. construct-in-place).
+	template<typename... Args>
+	AE_FORCEINLINE bool try_emplace(Args&&... args) AE_NO_TSAN
+	{
+		return inner_enqueue<CannotAlloc>(std::forward<Args>(args)...);
+	}
+#endif
+
+	// Enqueues a copy of element on the queue.
+	// Allocates an additional block of memory if needed.
+	// Only fails (returns false) if memory allocation fails.
+	AE_FORCEINLINE bool enqueue(T const& element) AE_NO_TSAN
+	{
+		return inner_enqueue<CanAlloc>(element);
+	}
+
+	// Enqueues a moved copy of element on the queue.
+	// Allocates an additional block of memory if needed.
+	// Only fails (returns false) if memory allocation fails.
+	AE_FORCEINLINE bool enqueue(T&& element) AE_NO_TSAN
+	{
+		return inner_enqueue<CanAlloc>(std::forward<T>(element));
+	}
+
+#if MOODYCAMEL_HAS_EMPLACE
+	// Like enqueue() but with emplace semantics (i.e. construct-in-place).
+	template<typename... Args>
+	AE_FORCEINLINE bool emplace(Args&&... args) AE_NO_TSAN
+	{
+		return inner_enqueue<CanAlloc>(std::forward<Args>(args)...);
+	}
+#endif
+
+	// Attempts to dequeue an element; if the queue is empty,
+	// returns false instead. If the queue has at least one element,
+	// moves front to result using operator=, then returns true.
+	template<typename U>
+	bool try_dequeue(U& result) AE_NO_TSAN
+	{
+#ifndef NDEBUG
+		ReentrantGuard guard(this->dequeuing);
+#endif
+
+		// High-level pseudocode:
+		// Remember where the tail block is
+		// If the front block has an element in it, dequeue it
+		// Else
+		//     If front block was the tail block when we entered the function, return false
+		//     Else advance to next block and dequeue the item there
+
+		// Note that we have to use the value of the tail block from before we check if the front
+		// block is full or not, in case the front block is empty and then, before we check if the
+		// tail block is at the front block or not, the producer fills up the front block *and
+		// moves on*, which would make us skip a filled block. Seems unlikely, but was consistently
+		// reproducible in practice.
+		// In order to avoid overhead in the common case, though, we do a double-checked pattern
+		// where we have the fast path if the front block is not empty, then read the tail block,
+		// then re-read the front block and check if it's not empty again, then check if the tail
+		// block has advanced.
+		
+		Block* frontBlock_ = frontBlock.load();
+		size_t blockTail = frontBlock_->localTail;
+		size_t blockFront = frontBlock_->front.load();
+		
+		if (blockFront != blockTail || blockFront != (frontBlock_->localTail = frontBlock_->tail.load())) {
+			fence(memory_order_acquire);
+			
+		non_empty_front_block:
+			// Front block not empty, dequeue from here
+			auto element = reinterpret_cast<T*>(frontBlock_->data + blockFront * sizeof(T));
+			result = std::move(*element);
+			element->~T();
+
+			blockFront = (blockFront + 1) & frontBlock_->sizeMask;
+
+			fence(memory_order_release);
+			frontBlock_->front = blockFront;
+		}
+		else if (frontBlock_ != tailBlock.load()) {
+			fence(memory_order_acquire);
+
+			frontBlock_ = frontBlock.load();
+			blockTail = frontBlock_->localTail = frontBlock_->tail.load();
+			blockFront = frontBlock_->front.load();
+			fence(memory_order_acquire);
+			
+			if (blockFront != blockTail) {
+				// Oh look, the front block isn't empty after all
+				goto non_empty_front_block;
+			}
+			
+			// Front block is empty but there's another block ahead, advance to it
+			Block* nextBlock = frontBlock_->next;
+			// Don't need an acquire fence here since next can only ever be set on the tailBlock,
+			// and we're not the tailBlock, and we did an acquire earlier after reading tailBlock which
+			// ensures next is up-to-date on this CPU in case we recently were at tailBlock.
+
+			size_t nextBlockFront = nextBlock->front.load();
+			size_t nextBlockTail = nextBlock->localTail = nextBlock->tail.load();
+			fence(memory_order_acquire);
+
+			// Since the tailBlock is only ever advanced after being written to,
+			// we know there's for sure an element to dequeue on it
+			assert(nextBlockFront != nextBlockTail);
+			AE_UNUSED(nextBlockTail);
+
+			// We're done with this block, let the producer use it if it needs
+			fence(memory_order_release);		// Expose possibly pending changes to frontBlock->front from last dequeue
+			frontBlock = frontBlock_ = nextBlock;
+
+			compiler_fence(memory_order_release);	// Not strictly needed
+
+			auto element = reinterpret_cast<T*>(frontBlock_->data + nextBlockFront * sizeof(T));
+			
+			result = std::move(*element);
+			element->~T();
+
+			nextBlockFront = (nextBlockFront + 1) & frontBlock_->sizeMask;
+			
+			fence(memory_order_release);
+			frontBlock_->front = nextBlockFront;
+		}
+		else {
+			// No elements in current block and no other block to advance to
+			return false;
+		}
+
+		return true;
+	}
+
+
+	// Returns a pointer to the front element in the queue (the one that
+	// would be removed next by a call to `try_dequeue` or `pop`). If the
+	// queue appears empty at the time the method is called, nullptr is
+	// returned instead.
+	// Must be called only from the consumer thread.
+	T* peek() const AE_NO_TSAN
+	{
+#ifndef NDEBUG
+		ReentrantGuard guard(this->dequeuing);
+#endif
+		// See try_dequeue() for reasoning
+
+		Block* frontBlock_ = frontBlock.load();
+		size_t blockTail = frontBlock_->localTail;
+		size_t blockFront = frontBlock_->front.load();
+		
+		if (blockFront != blockTail || blockFront != (frontBlock_->localTail = frontBlock_->tail.load())) {
+			fence(memory_order_acquire);
+		non_empty_front_block:
+			return reinterpret_cast<T*>(frontBlock_->data + blockFront * sizeof(T));
+		}
+		else if (frontBlock_ != tailBlock.load()) {
+			fence(memory_order_acquire);
+			frontBlock_ = frontBlock.load();
+			blockTail = frontBlock_->localTail = frontBlock_->tail.load();
+			blockFront = frontBlock_->front.load();
+			fence(memory_order_acquire);
+			
+			if (blockFront != blockTail) {
+				goto non_empty_front_block;
+			}
+			
+			Block* nextBlock = frontBlock_->next;
+			
+			size_t nextBlockFront = nextBlock->front.load();
+			fence(memory_order_acquire);
+
+			assert(nextBlockFront != nextBlock->tail.load());
+			return reinterpret_cast<T*>(nextBlock->data + nextBlockFront * sizeof(T));
+		}
+		
+		return nullptr;
+	}
+	
+	// Removes the front element from the queue, if any, without returning it.
+	// Returns true on success, or false if the queue appeared empty at the time
+	// `pop` was called.
+	bool pop() AE_NO_TSAN
+	{
+#ifndef NDEBUG
+		ReentrantGuard guard(this->dequeuing);
+#endif
+		// See try_dequeue() for reasoning
+		
+		Block* frontBlock_ = frontBlock.load();
+		size_t blockTail = frontBlock_->localTail;
+		size_t blockFront = frontBlock_->front.load();
+		
+		if (blockFront != blockTail || blockFront != (frontBlock_->localTail = frontBlock_->tail.load())) {
+			fence(memory_order_acquire);
+			
+		non_empty_front_block:
+			auto element = reinterpret_cast<T*>(frontBlock_->data + blockFront * sizeof(T));
+			element->~T();
+
+			blockFront = (blockFront + 1) & frontBlock_->sizeMask;
+
+			fence(memory_order_release);
+			frontBlock_->front = blockFront;
+		}
+		else if (frontBlock_ != tailBlock.load()) {
+			fence(memory_order_acquire);
+			frontBlock_ = frontBlock.load();
+			blockTail = frontBlock_->localTail = frontBlock_->tail.load();
+			blockFront = frontBlock_->front.load();
+			fence(memory_order_acquire);
+			
+			if (blockFront != blockTail) {
+				goto non_empty_front_block;
+			}
+			
+			// Front block is empty but there's another block ahead, advance to it
+			Block* nextBlock = frontBlock_->next;
+			
+			size_t nextBlockFront = nextBlock->front.load();
+			size_t nextBlockTail = nextBlock->localTail = nextBlock->tail.load();
+			fence(memory_order_acquire);
+
+			assert(nextBlockFront != nextBlockTail);
+			AE_UNUSED(nextBlockTail);
+
+			fence(memory_order_release);
+			frontBlock = frontBlock_ = nextBlock;
+
+			compiler_fence(memory_order_release);
+
+			auto element = reinterpret_cast<T*>(frontBlock_->data + nextBlockFront * sizeof(T));
+			element->~T();
+
+			nextBlockFront = (nextBlockFront + 1) & frontBlock_->sizeMask;
+			
+			fence(memory_order_release);
+			frontBlock_->front = nextBlockFront;
+		}
+		else {
+			// No elements in current block and no other block to advance to
+			return false;
+		}
+
+		return true;
+	}
+	
+	// Returns the approximate number of items currently in the queue.
+	// Safe to call from both the producer and consumer threads.
+	inline size_t size_approx() const AE_NO_TSAN
+	{
+		size_t result = 0;
+		Block* frontBlock_ = frontBlock.load();
+		Block* block = frontBlock_;
+		do {
+			fence(memory_order_acquire);
+			size_t blockFront = block->front.load();
+			size_t blockTail = block->tail.load();
+			result += (blockTail - blockFront) & block->sizeMask;
+			block = block->next.load();
+		} while (block != frontBlock_);
+		return result;
+	}
+
+	// Returns the total number of items that could be enqueued without incurring
+	// an allocation when this queue is empty.
+	// Safe to call from both the producer and consumer threads.
+	//
+	// NOTE: The actual capacity during usage may be different depending on the consumer.
+	//       If the consumer is removing elements concurrently, the producer cannot add to
+	//       the block the consumer is removing from until it's completely empty, except in
+	//       the case where the producer was writing to the same block the consumer was
+	//       reading from the whole time.
+	inline size_t max_capacity() const {
+		size_t result = 0;
+		Block* frontBlock_ = frontBlock.load();
+		Block* block = frontBlock_;
+		do {
+			fence(memory_order_acquire);
+			result += block->sizeMask;
+			block = block->next.load();
+		} while (block != frontBlock_);
+		return result;
+	}
+
+
+private:
+	enum AllocationMode { CanAlloc, CannotAlloc };
+
+#if MOODYCAMEL_HAS_EMPLACE
+	template<AllocationMode canAlloc, typename... Args>
+	bool inner_enqueue(Args&&... args) AE_NO_TSAN
+#else
+	template<AllocationMode canAlloc, typename U>
+	bool inner_enqueue(U&& element) AE_NO_TSAN
+#endif
+	{
+#ifndef NDEBUG
+		ReentrantGuard guard(this->enqueuing);
+#endif
+
+		// High-level pseudocode (assuming we're allowed to alloc a new block):
+		// If room in tail block, add to tail
+		// Else check next block
+		//     If next block is not the head block, enqueue on next block
+		//     Else create a new block and enqueue there
+		//     Advance tail to the block we just enqueued to
+
+		Block* tailBlock_ = tailBlock.load();
+		size_t blockFront = tailBlock_->localFront;
+		size_t blockTail = tailBlock_->tail.load();
+
+		size_t nextBlockTail = (blockTail + 1) & tailBlock_->sizeMask;
+		if (nextBlockTail != blockFront || nextBlockTail != (tailBlock_->localFront = tailBlock_->front.load())) {
+			fence(memory_order_acquire);
+			// This block has room for at least one more element
+			char* location = tailBlock_->data + blockTail * sizeof(T);
+#if MOODYCAMEL_HAS_EMPLACE
+			new (location) T(std::forward<Args>(args)...);
+#else
+			new (location) T(std::forward<U>(element));
+#endif
+
+			fence(memory_order_release);
+			tailBlock_->tail = nextBlockTail;
+		}
+		else {
+			fence(memory_order_acquire);
+			if (tailBlock_->next.load() != frontBlock) {
+				// Note that the reason we can't advance to the frontBlock and start adding new entries there
+				// is because if we did, then dequeue would stay in that block, eventually reading the new values,
+				// instead of advancing to the next full block (whose values were enqueued first and so should be
+				// consumed first).
+
+				fence(memory_order_acquire);		// Ensure we get latest writes if we got the latest frontBlock
+
+				// tailBlock is full, but there's a free block ahead, use it
+				Block* tailBlockNext = tailBlock_->next.load();
+				size_t nextBlockFront = tailBlockNext->localFront = tailBlockNext->front.load();
+				nextBlockTail = tailBlockNext->tail.load();
+				fence(memory_order_acquire);
+
+				// This block must be empty since it's not the head block and we
+				// go through the blocks in a circle
+				assert(nextBlockFront == nextBlockTail);
+				tailBlockNext->localFront = nextBlockFront;
+
+				char* location = tailBlockNext->data + nextBlockTail * sizeof(T);
+#if MOODYCAMEL_HAS_EMPLACE
+				new (location) T(std::forward<Args>(args)...);
+#else
+				new (location) T(std::forward<U>(element));
+#endif
+
+				tailBlockNext->tail = (nextBlockTail + 1) & tailBlockNext->sizeMask;
+
+				fence(memory_order_release);
+				tailBlock = tailBlockNext;
+			}
+			else if (canAlloc == CanAlloc) {
+				// tailBlock is full and there's no free block ahead; create a new block
+				auto newBlockSize = largestBlockSize >= MAX_BLOCK_SIZE ? largestBlockSize : largestBlockSize * 2;
+				auto newBlock = make_block(newBlockSize);
+				if (newBlock == nullptr) {
+					// Could not allocate a block!
+					return false;
+				}
+				largestBlockSize = newBlockSize;
+
+#if MOODYCAMEL_HAS_EMPLACE
+				new (newBlock->data) T(std::forward<Args>(args)...);
+#else
+				new (newBlock->data) T(std::forward<U>(element));
+#endif
+				assert(newBlock->front == 0);
+				newBlock->tail = newBlock->localTail = 1;
+
+				newBlock->next = tailBlock_->next.load();
+				tailBlock_->next = newBlock;
+
+				// Might be possible for the dequeue thread to see the new tailBlock->next
+				// *without* seeing the new tailBlock value, but this is OK since it can't
+				// advance to the next block until tailBlock is set anyway (because the only
+				// case where it could try to read the next is if it's already at the tailBlock,
+				// and it won't advance past tailBlock in any circumstance).
+
+				fence(memory_order_release);
+				tailBlock = newBlock;
+			}
+			else if (canAlloc == CannotAlloc) {
+				// Would have had to allocate a new block to enqueue, but not allowed
+				return false;
+			}
+			else {
+				assert(false && "Should be unreachable code");
+				return false;
+			}
+		}
+
+		return true;
+	}
+
+
+	// Disable copying
+	ReaderWriterQueue(ReaderWriterQueue const&) {  }
+
+	// Disable assignment
+	ReaderWriterQueue& operator=(ReaderWriterQueue const&) {  }
+
+
+	AE_FORCEINLINE static size_t ceilToPow2(size_t x)
+	{
+		// From http://graphics.stanford.edu/~seander/bithacks.html#RoundUpPowerOf2
+		--x;
+		x |= x >> 1;
+		x |= x >> 2;
+		x |= x >> 4;
+		for (size_t i = 1; i < sizeof(size_t); i <<= 1) {
+			x |= x >> (i << 3);
+		}
+		++x;
+		return x;
+	}
+	
+	template<typename U>
+	static AE_FORCEINLINE char* align_for(char* ptr) AE_NO_TSAN
+	{
+		const std::size_t alignment = std::alignment_of<U>::value;
+		return ptr + (alignment - (reinterpret_cast<std::uintptr_t>(ptr) % alignment)) % alignment;
+	}
+private:
+#ifndef NDEBUG
+	struct ReentrantGuard
+	{
+		AE_NO_TSAN ReentrantGuard(weak_atomic<bool>& _inSection)
+			: inSection(_inSection)
+		{
+			assert(!inSection && "Concurrent (or re-entrant) enqueue or dequeue operation detected (only one thread at a time may hold the producer or consumer role)");
+			inSection = true;
+		}
+
+		AE_NO_TSAN ~ReentrantGuard() { inSection = false; }
+
+	private:
+		ReentrantGuard& operator=(ReentrantGuard const&);
+
+	private:
+		weak_atomic<bool>& inSection;
+	};
+#endif
+
+	struct Block
+	{
+		// Avoid false-sharing by putting highly contended variables on their own cache lines
+		weak_atomic<size_t> front;	// (Atomic) Elements are read from here
+		size_t localTail;			// An uncontended shadow copy of tail, owned by the consumer
+		
+		char cachelineFiller0[MOODYCAMEL_CACHE_LINE_SIZE - sizeof(weak_atomic<size_t>) - sizeof(size_t)];
+		weak_atomic<size_t> tail;	// (Atomic) Elements are enqueued here
+		size_t localFront;
+		
+		char cachelineFiller1[MOODYCAMEL_CACHE_LINE_SIZE - sizeof(weak_atomic<size_t>) - sizeof(size_t)];	// next isn't very contended, but we don't want it on the same cache line as tail (which is)
+		weak_atomic<Block*> next;	// (Atomic)
+		
+		char* data;		// Contents (on heap) are aligned to T's alignment
+
+		const size_t sizeMask;
+
+
+		// size must be a power of two (and greater than 0)
+		AE_NO_TSAN Block(size_t const& _size, char* _rawThis, char* _data)
+			: front(0UL), localTail(0), tail(0UL), localFront(0), next(nullptr), data(_data), sizeMask(_size - 1), rawThis(_rawThis)
+		{
+		}
+
+	private:
+		// C4512 - Assignment operator could not be generated
+		Block& operator=(Block const&);
+
+	public:
+		char* rawThis;
+	};
+	
+	
+	static Block* make_block(size_t capacity) AE_NO_TSAN
+	{
+		// Allocate enough memory for the block itself, as well as all the elements it will contain
+		auto size = sizeof(Block) + std::alignment_of<Block>::value - 1;
+		size += sizeof(T) * capacity + std::alignment_of<T>::value - 1;
+		auto newBlockRaw = static_cast<char*>(std::malloc(size));
+		if (newBlockRaw == nullptr) {
+			return nullptr;
+		}
+		
+		auto newBlockAligned = align_for<Block>(newBlockRaw);
+		auto newBlockData = align_for<T>(newBlockAligned + sizeof(Block));
+		return new (newBlockAligned) Block(capacity, newBlockRaw, newBlockData);
+	}
+
+private:
+	weak_atomic<Block*> frontBlock;		// (Atomic) Elements are dequeued from this block
+	
+	char cachelineFiller[MOODYCAMEL_CACHE_LINE_SIZE - sizeof(weak_atomic<Block*>)];
+	weak_atomic<Block*> tailBlock;		// (Atomic) Elements are enqueued to this block
+
+	size_t largestBlockSize;
+
+#ifndef NDEBUG
+	weak_atomic<bool> enqueuing;
+	mutable weak_atomic<bool> dequeuing;
+#endif
+};
+
+// Like ReaderWriterQueue, but also providees blocking operations
+template<typename T, size_t MAX_BLOCK_SIZE = 512>
+class BlockingReaderWriterQueue
+{
+private:
+	typedef ::moodycamel::ReaderWriterQueue<T, MAX_BLOCK_SIZE> ReaderWriterQueue;
+	
+public:
+	explicit BlockingReaderWriterQueue(size_t size = 15) AE_NO_TSAN
+		: inner(size), sema(new spsc_sema::LightweightSemaphore())
+	{ }
+
+	BlockingReaderWriterQueue(BlockingReaderWriterQueue&& other) AE_NO_TSAN
+		: inner(std::move(other.inner)), sema(std::move(other.sema))
+	{ }
+
+	BlockingReaderWriterQueue& operator=(BlockingReaderWriterQueue&& other) AE_NO_TSAN
+	{
+		std::swap(sema, other.sema);
+		std::swap(inner, other.inner);
+		return *this;
+	}
+
+
+	// Enqueues a copy of element if there is room in the queue.
+	// Returns true if the element was enqueued, false otherwise.
+	// Does not allocate memory.
+	AE_FORCEINLINE bool try_enqueue(T const& element) AE_NO_TSAN
+	{
+		if (inner.try_enqueue(element)) {
+			sema->signal();
+			return true;
+		}
+		return false;
+	}
+
+	// Enqueues a moved copy of element if there is room in the queue.
+	// Returns true if the element was enqueued, false otherwise.
+	// Does not allocate memory.
+	AE_FORCEINLINE bool try_enqueue(T&& element) AE_NO_TSAN
+	{
+		if (inner.try_enqueue(std::forward<T>(element))) {
+			sema->signal();
+			return true;
+		}
+		return false;
+	}
+
+#if MOODYCAMEL_HAS_EMPLACE
+	// Like try_enqueue() but with emplace semantics (i.e. construct-in-place).
+	template<typename... Args>
+	AE_FORCEINLINE bool try_emplace(Args&&... args) AE_NO_TSAN
+	{
+		if (inner.try_emplace(std::forward<Args>(args)...)) {
+			sema->signal();
+			return true;
+		}
+		return false;
+	}
+#endif
+
+
+	// Enqueues a copy of element on the queue.
+	// Allocates an additional block of memory if needed.
+	// Only fails (returns false) if memory allocation fails.
+	AE_FORCEINLINE bool enqueue(T const& element) AE_NO_TSAN
+	{
+		if (inner.enqueue(element)) {
+			sema->signal();
+			return true;
+		}
+		return false;
+	}
+
+	// Enqueues a moved copy of element on the queue.
+	// Allocates an additional block of memory if needed.
+	// Only fails (returns false) if memory allocation fails.
+	AE_FORCEINLINE bool enqueue(T&& element) AE_NO_TSAN
+	{
+		if (inner.enqueue(std::forward<T>(element))) {
+			sema->signal();
+			return true;
+		}
+		return false;
+	}
+
+#if MOODYCAMEL_HAS_EMPLACE
+	// Like enqueue() but with emplace semantics (i.e. construct-in-place).
+	template<typename... Args>
+	AE_FORCEINLINE bool emplace(Args&&... args) AE_NO_TSAN
+	{
+		if (inner.emplace(std::forward<Args>(args)...)) {
+			sema->signal();
+			return true;
+		}
+		return false;
+	}
+#endif
+
+
+	// Attempts to dequeue an element; if the queue is empty,
+	// returns false instead. If the queue has at least one element,
+	// moves front to result using operator=, then returns true.
+	template<typename U>
+	bool try_dequeue(U& result) AE_NO_TSAN
+	{
+		if (sema->tryWait()) {
+			bool success = inner.try_dequeue(result);
+			assert(success);
+			AE_UNUSED(success);
+			return true;
+		}
+		return false;
+	}
+	
+	
+	// Attempts to dequeue an element; if the queue is empty,
+	// waits until an element is available, then dequeues it.
+	template<typename U>
+	void wait_dequeue(U& result) AE_NO_TSAN
+	{
+		while (!sema->wait());
+		bool success = inner.try_dequeue(result);
+		AE_UNUSED(result);
+		assert(success);
+		AE_UNUSED(success);
+	}
+
+
+	// Attempts to dequeue an element; if the queue is empty,
+	// waits until an element is available up to the specified timeout,
+	// then dequeues it and returns true, or returns false if the timeout
+	// expires before an element can be dequeued.
+	// Using a negative timeout indicates an indefinite timeout,
+	// and is thus functionally equivalent to calling wait_dequeue.
+	template<typename U>
+	bool wait_dequeue_timed(U& result, std::int64_t timeout_usecs) AE_NO_TSAN
+	{
+		if (!sema->wait(timeout_usecs)) {
+			return false;
+		}
+		bool success = inner.try_dequeue(result);
+		AE_UNUSED(result);
+		assert(success);
+		AE_UNUSED(success);
+		return true;
+	}
+
+
+#if __cplusplus > 199711L || _MSC_VER >= 1700
+	// Attempts to dequeue an element; if the queue is empty,
+	// waits until an element is available up to the specified timeout,
+	// then dequeues it and returns true, or returns false if the timeout
+	// expires before an element can be dequeued.
+	// Using a negative timeout indicates an indefinite timeout,
+	// and is thus functionally equivalent to calling wait_dequeue.
+	template<typename U, typename Rep, typename Period>
+	inline bool wait_dequeue_timed(U& result, std::chrono::duration<Rep, Period> const& timeout) AE_NO_TSAN
+	{
+        return wait_dequeue_timed(result, std::chrono::duration_cast<std::chrono::microseconds>(timeout).count());
+	}
+#endif
+
+
+	// Returns a pointer to the front element in the queue (the one that
+	// would be removed next by a call to `try_dequeue` or `pop`). If the
+	// queue appears empty at the time the method is called, nullptr is
+	// returned instead.
+	// Must be called only from the consumer thread.
+	AE_FORCEINLINE T* peek() const AE_NO_TSAN
+	{
+		return inner.peek();
+	}
+	
+	// Removes the front element from the queue, if any, without returning it.
+	// Returns true on success, or false if the queue appeared empty at the time
+	// `pop` was called.
+	AE_FORCEINLINE bool pop() AE_NO_TSAN
+	{
+		if (sema->tryWait()) {
+			bool result = inner.pop();
+			assert(result);
+			AE_UNUSED(result);
+			return true;
+		}
+		return false;
+	}
+	
+	// Returns the approximate number of items currently in the queue.
+	// Safe to call from both the producer and consumer threads.
+	AE_FORCEINLINE size_t size_approx() const AE_NO_TSAN
+	{
+		return sema->availableApprox();
+	}
+
+	// Returns the total number of items that could be enqueued without incurring
+	// an allocation when this queue is empty.
+	// Safe to call from both the producer and consumer threads.
+	//
+	// NOTE: The actual capacity during usage may be different depending on the consumer.
+	//       If the consumer is removing elements concurrently, the producer cannot add to
+	//       the block the consumer is removing from until it's completely empty, except in
+	//       the case where the producer was writing to the same block the consumer was
+	//       reading from the whole time.
+	AE_FORCEINLINE size_t max_capacity() const {
+		return inner.max_capacity();
+	}
+
+private:
+	// Disable copying & assignment
+	BlockingReaderWriterQueue(BlockingReaderWriterQueue const&) {  }
+	BlockingReaderWriterQueue& operator=(BlockingReaderWriterQueue const&) {  }
+	
+private:
+	ReaderWriterQueue inner;
+	std::unique_ptr<spsc_sema::LightweightSemaphore> sema;
+};
+
+}    // end namespace moodycamel
+
+#ifdef AE_VCPP
+#pragma warning(pop)
+#endif
--- a/duix-sdk/src/main/cpp/dhmfcc/AudioFFT.cpp
+++ b/duix-sdk/src/main/cpp/dhmfcc/AudioFFT.cpp
--- a/duix-sdk/src/main/cpp/dhmfcc/dhpcm.cpp
+++ b/duix-sdk/src/main/cpp/dhmfcc/dhpcm.cpp
@ -0,0 +1,803 @@
+#include "dhpcm.h"
+#include "mfcc/mfcc.hpp"
+#include <stdio.h>
+#include "aicommon.h"
+#include <vector>
+#include <string>
+#include "opencv2/core.hpp"
+#ifdef USE_HELPER    
+#include "dhdatahelper.h"
+#endif
+
+
+
+PcmItem::PcmItem(int sentid,int minoff,int maxblock,int flip,int inx){
+  m_flip = flip;
+  m_inx = inx;
+  m_sentid = sentid;
+  m_maxblock = maxblock;
+  //int dist = minoff - STREAM_MFCC_FILL;
+  //if(dist>0) m_minoff = dist;
+  m_minoff = minoff;
+  int allcnt = m_minoff + maxblock + 2*STREAM_MFCC_FILL;
+  pcm_allsamp = allcnt*STREAM_BASE_SAMP;
+  mel_allcnt = pcm_allsamp/160+1;
+  bnf_allcnt = mel_allcnt*0.25f - 0.75f;
+  //printf("==minoff %d max %d allcnt %d melcnt %d bnfcnt %d\n", minoff,maxblock,allcnt,mel_allcnt,bnf_allcnt );
+  m_pcm = jmat_alloc(STREAM_BASE_SAMP,allcnt,1,0,4,NULL);
+  //m_pcm = new jmat_t(STREAM_BASE_SAMP,allcnt,1);
+  m_mel = jmat_alloc(STREAM_BASE_MEL,mel_allcnt,1,0,4,NULL);
+  //m_mel = new jmat_t(STREAM_BASE_MEL,mel_allcnt,1);
+  m_bnf = jmat_alloc(STREAM_BASE_BNF,bnf_allcnt,1,0,4,NULL);
+  m_bnfflip = jmat_alloc(STREAM_BASE_BNF,bnf_allcnt,1,0,4,NULL);
+  //m_bnf = new jmat_t(STREAM_BASE_BNF,bnf_allcnt,1);
+  //gjvad_alloc(&m_vad,STREAM_BASE_SAMP/2);
+  mat_flip = jmat_null();
+}
+
+PcmItem::~PcmItem(){
+  if(m_pcm) jmat_free(m_pcm);
+  if(m_mel) jmat_free(m_mel);
+  if(m_bnf)jmat_free(m_bnf);
+  if(m_wav)jmat_free(m_wav);
+  if(m_mfcc)jmat_free(m_mfcc);
+  if(m_bnfflip)jmat_free(m_bnfflip);
+  jmat_deref(mat_flip);
+  //gjvad_free(&m_vad);
+}
+
+int PcmItem::reset(){
+  jbuf_zeros((jbuf_t*)m_pcm);
+  jbuf_zeros((jbuf_t*)m_mel);
+  jbuf_zeros((jbuf_t*)m_bnf);
+  return 0;
+}
+
+int PcmItem::fillPcm(uint64_t sessid,uint64_t tickinx,jmat_t* premat,jmat_t* mat){
+  m_wav = mat;
+  pcm_block = mat->height;
+  if(pcm_block>(m_maxblock+STREAM_MFCC_FILL))return -1;
+  pre_block = premat?premat->height:0;
+  int pcmcnt = pcm_block ;
+
+  int allcnt = m_minoff + pcmcnt + 2*STREAM_MFCC_FILL;
+  //printf("===off %d pcm %d pad %d\n",m_minoff,pcmcnt,2*STREAM_MFCC_FILL);
+  pcm_allsamp = allcnt*STREAM_BASE_SAMP;
+  mel_allcnt = pcm_allsamp/160+1;
+  bnf_allcnt = mel_allcnt*0.25f - 0.75f;
+
+  m_sessid = sessid;
+  m_pcminx = tickinx;
+  {
+    //fill pre
+    int dlen = m_minoff + STREAM_MFCC_FILL;
+    int blank = dlen - pre_block;
+    if(blank){
+      float* pbuf = (float*)m_pcm->data;
+      int samp = blank*STREAM_BASE_SAMP;
+      memset(pbuf,0,samp*sizeof(float));
+    }
+    if(pre_block){
+      short* ps = (short*)premat->data;
+      for(int k=blank;k<dlen;k++){
+        float* pbuf = (float*)jmat_row(m_pcm,k);
+        for(int m=0;m<STREAM_BASE_SAMP;m++){
+          *pbuf++ = *ps++/32768.f;
+        }
+      }
+    }
+  }
+  {
+    //fill pcm
+    int dlen = pcmcnt + STREAM_MFCC_FILL;
+    int blank = dlen - pcm_block;
+    int offset = m_minoff + STREAM_MFCC_FILL;
+    short* ps = (short*)mat->data;
+    for(int k=0;k<pcm_block;k++){
+      float* pbuf = (float*)jmat_row(m_pcm,k+offset);
+      for(int m=0;m<STREAM_BASE_SAMP;m++){
+        *pbuf++ = *ps++/32768.f;
+      }
+    }
+    if(blank){
+      float* pbuf = (float*)jmat_row(m_pcm,offset+pcm_block);
+      float* abuf = (float*)m_pcm->data;
+      int samp = blank*STREAM_BASE_SAMP;
+      memset(pbuf,0,samp*sizeof(float));
+    }
+  }
+  return 0;
+}
+
+int PcmItem::checkValid(uint64_t tickinx){
+  if(!tickinx)return 1;
+  return tickinx<=(m_pcminx+pcm_block);//&&(tickinx<=(m_pcminx+pcm_block));
+}
+
+jmat_t* PcmItem::readlast(int minoff){
+  if(minoff>pcm_block)return NULL;
+  int start = pcm_block - minoff;
+  jmat_t* mpre = jmat_alloc(STREAM_BASE_PCM,minoff,1,0,1, m_wav->data + start);
+  return mpre;
+}
+
+int PcmItem::readblock(){
+  return  pcm_read;
+}
+
+int PcmItem::numblock(){
+  return pcm_block;
+}
+
+int PcmItem::readbnf(char* buf){
+  if(!m_ready)return 0;
+  char* mdata = jmat_row(m_mfcc,0);
+  int cnt = pcm_block ;
+  memcpy(buf,mdata,STREAM_ALL_BNF*cnt);
+  return 0;
+}
+
+int PcmItem::readblock(jmat_t* pcm,jmat_t* mfcc){
+  if(!m_ready)return 0;
+  if(pcm_read>=pcm_block)return 0;
+  if(pcm){
+    char* rdata = jmat_row(m_wav,pcm_read);
+    memcpy(pcm->data,rdata,STREAM_BASE_PCM);
+  }
+  int inx =  pcm_read?pcm_read-1:0;
+  char* mdata = jmat_row(m_mfcc,inx);
+  //printf("===inx %d mfcc %d\n",inx,m_mfcc->height);
+  memcpy(mfcc->data,mdata,STREAM_ALL_BNF);
+  pcm_read++;
+  return 1;
+}
+
+int PcmItem::readblock(int inx,jmat_t* pcm,jmat_t* mfcc){
+  if(!m_ready)return 0;
+  if(inx>=pcm_block)return 0;
+  if(pcm){
+    char* rdata = jmat_row(m_wav,inx);
+    memcpy(pcm->data,rdata,STREAM_BASE_PCM);
+  }
+  int newinx =  inx?inx-1:0;
+  if(m_flip){
+    jmat_reroi(mat_flip,m_mfcc,STREAM_BASE_BNF,20,0,newinx);
+#ifdef USE_HELPER    //jmat_dump(mat_flip);
+    cv::Mat sm = dh2cvmat(mat_flip);
+    jmat_reshape(mfcc,20,STREAM_BASE_BNF);
+    cv::Mat dm = dh2cvmat(mfcc);
+    cv::transpose(sm,dm);
+#endif
+    //jmat_dump(mfcc);
+  }else{
+    char* mdata = jmat_row(m_mfcc,newinx);
+  //printf("===inx %d mfcc %d\n",inx,m_mfcc->height);
+    memcpy(mfcc->data,mdata,STREAM_ALL_BNF);
+  }
+  return 1;
+}
+
+void PcmItem::dump(FILE* dumpfile){
+  printf("===dumpone %d\n",pcm_block);
+  for(int k=0;k<pcm_block;k++){
+    char* rdata = jmat_row(m_wav,k);
+    fwrite(rdata,1,STREAM_BASE_PCM,dumpfile);
+  }
+}
+
+int PcmItem::runWenet(WeAI* weai){
+  int rst = 0;
+  float* fwav = (float*)m_pcm->data;
+  float* mel = (float*)m_mel->data;
+  rst = DhWenet::calcmfcc(fwav,pcm_allsamp,mel,mel_allcnt);
+
+  //float* bnf = m_flip? (float*)m_bnfflip->data:(float*)m_bnf->data;
+  float* bnf = (float*)m_bnf->data;
+  //tooken
+  uint64_t tick = jtimer_msstamp();
+#ifdef AIRUN_FLAG
+  rst =  weai->run(mel,mel_allcnt,bnf,bnf_allcnt);
+#endif
+  int dist = jtimer_msstamp()-tick;
+  if(0){
+    float* pf = (float*)bnf;
+    for(int k=0;k<256;k++){
+      printf("=%d==%f\n",k,*pf++);
+    }
+  }
+
+  printf("===pcm %ld %d  mel %d bnf %d dist %d \n",tick,m_pcm->height,mel_allcnt,bnf_allcnt,dist);
+  /*
+  if(m_flip){
+    printf("==flip\n");
+    cv::Mat matbnf = dh2cvmat(m_bnf) ;
+    cv::Mat matflip =dh2cvmat(m_bnfflip);
+    cv::transpose(matflip,matbnf);
+    //jmat_reshape(m_bnf,256,bnf_allcnt);
+  }
+  */
+
+  //printf("===bbb \n");
+  int inxstart = m_minoff;
+  uint64_t tickinx = m_pcminx;
+  float* rbnf = (float*)jmat_row(m_bnf,inxstart);
+  int rcnt = pcm_block;
+  jmat_t* matbnf = jmat_alloc(STREAM_BASE_BNF,rcnt+19,1,0,4,NULL);
+  memcpy(matbnf->data,rbnf,matbnf->buf.size);
+  m_mfcc = matbnf;
+  /*
+  jmat_t* dmat = jmat_alloc(20,256,1,0,4,NULL);
+  cv::Mat bm =dh2cvmat(dmat);
+  for(int k=0;k<10;k++){
+    printf("====k%d\n",k);
+    jmat_t* mat = jmat_roi(m_mfcc,256,20,0,k);
+    cv::Mat am = dh2cvmat(mat) ;
+    cv::transpose(am,bm);
+    jmat_deref(mat);
+    break;
+  }
+  */
+  m_ready = 1;
+  return 0;
+}
+
+PcmFile::PcmFile(int fps,int minoff,int mincnt,int maxcnt){
+  m_fps = fps;
+  m_scale = fps*1.0f/25.0f;
+  m_adj = fps!=25;
+  m_minoff = minoff;
+  m_mincnt = mincnt;
+  m_maxcnt = maxcnt;
+  m_maxsize = maxcnt* STREAM_BASE_PCM;
+  m_minsize = mincnt* STREAM_BASE_PCM;
+  m_arrmax = (int*)malloc(sizeof(int)*1024);
+  memset(m_arrmax,0,sizeof(int)*1024);
+  m_arrmin = (int*)malloc(sizeof(int)*1024);
+  memset(m_arrmin,0,sizeof(int)*1024);
+}
+
+PcmFile::~PcmFile(){
+  for(int k=0;k<vec_pcm.size();k++){
+    PcmItem* item = vec_pcm[k];
+    delete item;
+  }
+  if(m_preitem){
+    delete m_preitem;
+    m_preitem = NULL;
+  }
+  free(m_arrmax);
+  free(m_arrmin);
+}
+
+int PcmFile::itemSize(){
+  return vec_pcm.size();
+}
+
+int PcmFile::process(int inx,WeAI* weai){
+  if(inx<0){
+    for(int k=0;k<vec_pcm.size();k++){
+      PcmItem* item = vec_pcm[k];
+      int rst = item->runWenet(weai);
+      m_calcblock = m_calcblock + item->numblock();
+      m_calccnt += 1;
+    }
+    return 0;
+  }else{
+    if(inx>=vec_pcm.size())return -1;
+    PcmItem* item = vec_pcm[inx];
+    int rst = item->runWenet(weai);
+    m_calcblock = m_calcblock + item->numblock();
+    m_calccnt += 1;
+    return rst;
+  }
+}
+
+int PcmFile::appenditem(jmat_t* mat,int noone){
+  int chkblock = mat->height;
+  int chkmin = m_lastitem?m_minoff:0;
+  //printf("===chk min %d chkblock %d\n",chkmin,chkblock);
+  int inx = m_fileblock ;
+  PcmItem* item = new PcmItem(0,chkmin,chkblock,m_flip,inx);
+  jmat_t* mpre = NULL;
+  if(m_lastitem){
+    mpre = m_lastitem->readlast(chkmin);
+  }
+  int rst = item->fillPcm(0,0,mpre,mat);
+  vec_pcm.push_back(item);
+  m_lastitem = item;
+  m_arrmin[vec_pcm.size()-1] = m_fileblock;
+  m_fileblock += item->numblock();
+  //printf("===m_fileblock %d to %d \n",m_fileblock,fileBlock());
+  m_arrmax[vec_pcm.size()-1] = m_fileblock;
+
+  if(mpre)jmat_free(mpre);
+  return 0;
+}
+
+int PcmFile::prepare(char* buf,int size,char* prebuf,int presize){
+  int rst = 0;
+  m_presize = presize;
+  m_preblock = presize/STREAM_BASE_PCM;
+  int cursize = size;
+  char* curhead = buf;
+  if(m_preblock){
+    jmat_t* mat = jmat_alloc(STREAM_BASE_PCM,m_preblock,1,0,1,prebuf);
+    int chkblock = mat->height;
+    int chkmin = m_lastitem?m_minoff:0;
+    int inx = 0;
+    PcmItem* item = new PcmItem(0,chkmin,chkblock,m_flip,inx);
+    m_preitem = item;
+    m_lastitem = item;
+    rst = item->fillPcm(0,0,NULL,mat);
+  }
+  while(cursize >= m_maxsize){
+    jmat_t* mat = jmat_alloc(STREAM_BASE_PCM,m_maxcnt,1,0,1,NULL);
+    memcpy(mat->data ,curhead,m_maxsize);
+    rst += appenditem(mat);
+    cursize -= m_maxsize;
+    curhead += m_maxsize;
+    //printf("====cursize %d\n",cursize);
+  }
+  if(cursize>0){
+    int block = cursize / STREAM_BASE_PCM;
+    if(block<m_mincnt)block = m_mincnt;
+    //printf("===lastblock %d cursize %d \n",block,cursize);
+    jmat_t* mat = jmat_alloc(STREAM_BASE_PCM,block,1,0,1,NULL);
+    memcpy(mat->data ,curhead,block*STREAM_BASE_PCM);
+    rst += appenditem(mat);
+  }
+  return 0;
+}
+
+int PcmFile::setflip(int flip){
+  m_flip = flip;
+  return 0;
+}
+
+int PcmFile::prepare(std::string& pcmfn){
+  /*
+  void* fhnd = wav_read_open(pcmfn.c_str());
+  if(!fhnd)return -1;
+  int format, channels, sr, bits_per_sample;
+  unsigned int data_length;
+  int res = wav_get_header(fhnd, &format, &channels, &sr, &bits_per_sample, &data_length);
+  if(data_length<1) return -2;
+  int sample = data_length/2;
+  jbuf_t* pcmbuf = jbuf_alloc(data_length);
+  int rst = wav_read_data(fhnd,(unsigned char*)pcmbuf->data,data_length);
+  wav_read_close(fhnd);
+  int cursize = data_length;
+  char* curhead = pcmbuf->data;
+
+  rst =  prepare(curhead,cursize);
+  dhmem_deref(pcmbuf);
+  return rst;
+  */
+  return 0;
+}
+
+jmat_t* PcmFile::readbnf(int sinx){
+  jmat_t* bnf = jmat_alloc(STREAM_BASE_BNF,m_fileblock,1,0,4,NULL);
+
+  return bnf;
+}
+
+int PcmFile::readbnf(char* bnf,int bnfsize){
+  int block = fileBlock();
+  int allsize = block*STREAM_BASE_BNF*sizeof(float);
+  if(bnfsize<allsize)return -1;
+  jmat_t* mbnf = jmat_alloc(STREAM_BASE_BNF,block,1,0,4,bnf);
+  int inx = 0;
+  for(int k=0;k<vec_pcm.size();k++){
+    PcmItem* item = vec_pcm[k];
+    char* buf = jmat_row(mbnf,inx);
+    item->readbnf(buf);
+    inx += item->numblock();
+  }
+  return block;
+}
+
+int PcmFile::readblock(int sinx,jmat_t* pcm,jmat_t* feat){
+  //if(pcm->width!=STREAM_BASE_PCM)return -2001; 
+  //if(feat->width!=STREAM_BASE_BNF)return -2002; 
+  int inx = sinx/m_scale;
+  if(inx>=m_fileblock)return -1;
+  printf("===inx %d calc %d\n",inx,m_calccnt);
+  if(inx>=m_calcblock)return 0;
+  int rst = 0;
+  PcmItem* curitem = NULL;
+  int newinx = 0;
+  for(int k=0;k<m_calccnt;k++){
+    if((inx<m_arrmax[k])&&(inx>=m_arrmin[k])){
+      curitem = vec_pcm[k];
+      newinx = inx - m_arrmin[k];
+      break;
+    }
+  }
+  if(curitem){
+    rst = curitem->readblock(newinx,pcm,feat);
+    if(rst){
+      if(pcm)pcm->buf.sessid = inx;
+      feat->buf.sessid = inx;
+    }
+    return rst;
+  }
+  return 0;
+}
+
+PcmSession::PcmSession(uint64_t sessid,int minoff,int mincnt,int maxcnt){
+  m_sessid = sessid;
+  m_minoff = minoff;
+  m_mincnt = mincnt;
+  m_maxcnt = maxcnt;
+  m_checkcnt = (mincnt+maxcnt)/2;
+  m_maxsize = maxcnt* STREAM_BASE_PCM;
+  m_minsize = mincnt* STREAM_BASE_PCM;
+  int csize = STREAM_BASE_PCM;
+  m_pcmcache = (uint8_t*)malloc(STREAM_BASE_PCM*maxcnt*10);
+  m_cachepos = 0;
+  m_cachemax = STREAM_BASE_PCM*maxcnt*10;
+  m_lastitem = NULL;
+  m_arrflag = (int*)malloc(1024*sizeof(int));
+  memset(m_arrflag,0,1024*sizeof(int));
+  m_arrmax = (int*)malloc(sizeof(int)*1024000);
+  memset(m_arrmax,0,sizeof(int)*1024000);
+  m_arrmin = (int*)malloc(sizeof(int)*1024000);
+  memset(m_arrmin,0,sizeof(int)*1024000);
+}
+
+PcmSession::~PcmSession(){
+  //std::unique_lock lock(m_lock);
+  for(int k=0;k<vec_pcm.size();k++){
+    PcmItem* item = vec_pcm[k];
+    if(item) delete item;
+    vec_pcm[k] = NULL;
+  }
+  free(m_pcmcache);
+  free(m_arrflag);
+  free(m_arrmin);
+  free(m_arrmax);
+}
+
+int PcmSession::setflip(int flip){
+  m_flip = flip;
+  return 0;
+}
+
+int PcmSession::appenditem(jmat_t* mat,int noone){
+  //std::unique_lock lock(m_lock);
+  //printf("===append %d\n",mat->height*STREAM_BASE_PCM);
+  //printf("===cur %d min %d max %d\n",m_curflag,m_minoff,m_maxcnt);
+  int chkblock = mat->height;
+  //printf("===chkblock %d\n",chkblock);
+  int chkmin = m_lastitem?m_minoff:0;
+  int inx = m_fileblock ;
+  PcmItem* item = new PcmItem(m_curflag,chkmin,chkblock,m_flip,inx);
+  //printf("===check cur %d off %d block %d\n",m_curflag,chkmin,chkblock);
+  jmat_t* mpre = NULL;
+  if(m_lastitem){
+    mpre = m_lastitem->readlast(chkmin);
+  }
+  int rst = item->fillPcm(m_sessid,0,mpre,mat);
+  //printf("===fill %d\n",rst);
+  vec_pcm.push_back(item);
+  m_lastitem = item;
+  m_arrmin[vec_pcm.size()-1] = m_fileblock;
+  m_fileblock += item->numblock();
+  m_arrmax[vec_pcm.size()-1] = m_fileblock;
+
+  m_numpush += chkblock;
+  m_lastitem = item;
+  m_workcnt ++;
+  if(mpre)jmat_free(mpre);
+  return 1;
+}
+
+int PcmSession::checkpcmcache(int flush){
+  if(m_cachepos<m_minsize)return 0;
+  //printf("===checkcache %d\n",m_cachepos);
+  uint8_t* curhead = m_pcmcache;
+  int cursize =  m_cachepos;
+  int rst = 0;
+  if(!m_lastitem){
+    jmat_t* mat = jmat_alloc(STREAM_BASE_PCM,m_mincnt,1,0,1,NULL);
+    memcpy(mat->data ,curhead,m_minsize);
+    rst += appenditem(mat);
+    cursize -= m_minsize;
+    curhead += m_minsize;
+  }
+  while(cursize >= m_maxsize){
+    jmat_t* mat = jmat_alloc(STREAM_BASE_PCM,m_maxcnt,1,0,1,NULL);
+    memcpy(mat->data ,curhead,m_maxsize);
+    rst += appenditem(mat);
+    cursize -= m_maxsize;
+    curhead += m_maxsize;
+  }
+  int dist =  m_calccnt - m_readcnt;
+  int force = dist<2;//distwait()<m_checkcnt;
+                     //printf("===dist %d cal %d read %d\n",dist,m_calccnt,m_readcnt);
+                     //printf("===force cnt %d\n",force);
+  if(force){
+    if(cursize >=m_minsize){
+      int chkblock = cursize / STREAM_BASE_PCM;
+      int chksize = chkblock * STREAM_BASE_PCM;
+      jmat_t* mat = jmat_alloc(STREAM_BASE_PCM,chkblock,1,0,1,NULL);
+      memcpy(mat->data ,curhead,chksize);
+      curhead += chksize;
+      cursize -= chksize;
+      rst += appenditem(mat);
+    }
+  }
+  if(curhead!=m_pcmcache){
+    m_cachepos = cursize;
+    memmove(m_pcmcache,curhead,cursize);
+  }
+  return rst;
+}
+
+int PcmSession::pushpcm(uint64_t sessid,uint8_t* buf,int len){
+  if(m_finished)return -1;
+  if(m_sessid!=sessid)return -2;
+
+  int rst = 0;
+  uint8_t* curhead = buf;
+  int cursize = len;
+  m_totalpush += len;
+  int allcnt = m_cachepos + cursize;
+
+  while(allcnt >= m_cachemax){
+    int cpsize = m_cachemax - m_cachepos;
+    memcpy(m_pcmcache + m_cachepos,curhead,cpsize);
+    m_cachepos = m_cachemax;
+    cursize -= cpsize;
+    curhead += cpsize;
+    allcnt -= m_cachemax;
+    rst += checkpcmcache();
+  }
+  if(cursize){
+    memcpy(m_pcmcache + m_cachepos,curhead,cursize);
+    m_cachepos += cursize;
+    rst += checkpcmcache();
+  }
+  return rst;
+}
+
+
+int PcmSession::simppcm(uint64_t sessid,uint8_t* buf,int len){
+  if(m_finished)return -1;
+  if(m_sessid!=sessid)return -2;
+  int rst = 0;
+  //printf("==curpos %d len %d\n",m_cachepos,len);
+  uint8_t* curhead = buf;
+  int cursize = len;
+  m_totalpush += len;
+  //int chkblock = m_first&&!m_lastitem?m_mincnt:m_maxcnt;
+  //int chksize = m_first&&!m_lastitem?m_firstsize:m_basesize;
+  //int chkfirst = m_first&&!m_lastitem;
+  if(m_cachepos){
+    int cnt = m_cachepos + len;
+    if(cnt>=m_minsize){
+      int chkblock = cnt / STREAM_BASE_PCM;
+      if(chkblock>m_maxcnt)chkblock = m_maxcnt;
+      int chksize = chkblock * STREAM_BASE_PCM;
+      jmat_t* mat = jmat_alloc(STREAM_BASE_PCM,chkblock,1,0,1,NULL);
+      int cpsize = (m_cachepos > chksize)?chksize:m_cachepos;
+      memcpy(mat->data,m_pcmcache,cpsize);
+      int left = chksize - cpsize;
+      if(left>0) memcpy(mat->data + cpsize,buf,left);
+      //printf("append a %d\n",left);
+      m_cachepos -= cpsize;
+      cursize -= left;
+      curhead += left;
+      rst = appenditem(mat);
+    }else{
+      memcpy(m_pcmcache+ m_cachepos,buf,len);
+      m_cachepos += len;
+      return 0;
+    }
+  }
+  while(cursize>=m_minsize){
+    //printf("pbbb\n");
+    int chkblock = cursize / STREAM_BASE_PCM;
+    if(chkblock>m_maxcnt)chkblock = m_maxcnt;
+    int chksize = chkblock * STREAM_BASE_PCM;
+
+    jmat_t* mat = jmat_alloc(STREAM_BASE_PCM,chkblock,1,0,1,NULL);
+    memcpy(mat->data ,curhead,chksize);
+    curhead += chksize;
+    cursize -= chksize;
+    rst = appenditem(mat);
+  }
+  if(cursize>0){
+    //printf("==cursize %d\n",cursize);
+    memcpy(m_pcmcache,curhead,cursize);
+    m_cachepos = cursize;
+  }
+  return rst;
+}
+
+int PcmSession::conpcm(uint64_t sessid){
+  //if(m_finished)return -1;
+  if(m_sessid!=sessid)return -2;
+  m_cachepos = 0;
+  m_finished = 0;
+  m_curflag ++;
+  return 0;
+}
+
+int PcmSession::finpcm(uint64_t sessid){
+  if(m_finished)return -1;
+  if(m_sessid!=sessid)return -2;
+  checkpcmcache();
+  if(m_cachepos){
+    int block = m_cachepos / STREAM_BASE_PCM;
+    int left = m_cachepos % STREAM_BASE_PCM;
+    if(left)block++;
+    jmat_t* mat = jmat_alloc(STREAM_BASE_PCM,block,1,0,1,NULL);
+    memset(mat->data,0,STREAM_BASE_PCM*block);
+    memcpy(mat->data,m_pcmcache,m_cachepos);
+    appenditem(mat);
+  }
+  m_finished = 1;
+  return 0;
+}
+
+int PcmSession::runfirst(uint64_t sessid,WeAI* weai){
+  if(m_sessid!=sessid)return -2;
+  if(!m_first)return 0;
+  if(m_calccnt)return 0;
+  PcmItem* item = vec_pcm[m_calccnt];
+  if(item){
+    item->runWenet(weai);
+    m_numcalc += item->numblock();
+  }
+  m_calccnt ++;
+  m_first = 0;
+  //
+  return 0;
+}
+
+int PcmSession::runcalc(uint64_t sessid,WeAI* weai,int mincalc){
+  if(m_sessid!=sessid)return -2;
+  if(m_first)return -1;
+  int rst = 0;
+  if(m_calccnt<m_workcnt){
+    int dist = m_calccnt - m_readcnt;
+    //printf("===disc %d work %d mincalc %d\n",dist,m_workcnt,mincalc);
+    if(dist<mincalc){
+      PcmItem* item = vec_pcm[m_calccnt];
+      if(item){
+        item->runWenet(weai);
+        m_numcalc += item->numblock();
+      }
+      m_calccnt ++;
+      rst = 1;
+    }
+  }else if(m_finished){
+    rst = -1;
+  }else{
+    rst = 0;
+  }
+  if(rst<1){
+    int dist = m_readcnt - m_clrcnt;
+    if(dist>5){
+      for(int k=0;k<m_readcnt-5;k++){
+        PcmItem* item = vec_pcm[k];
+        vec_pcm[k] = NULL;
+        if(item){ 
+          delete item;
+          m_clrcnt = k;
+        }
+      }
+    }
+  }
+  return rst;
+}
+
+void PcmSession::dump(char* dumpfn){
+  FILE* dumpfile = fopen(dumpfn,"wb");
+  printf("===dump %ld\n",vec_pcm.size());
+  for(int k=0;k<vec_pcm.size();k++){
+    PcmItem* item = vec_pcm[k];
+    item->dump(dumpfile);
+  }
+  fclose(dumpfile);
+}
+
+int PcmSession::distwait(){
+  printf("===calc %d read %d \n",m_numcalc,m_numread);
+  return m_numpush - m_numread;
+}
+
+int PcmSession::readnext(uint64_t sessid,uint8_t* pcmbuf,int pcmlen,uint8_t* bnfbuf,int bnflen){
+  if(m_sessid!=sessid)return -2;
+  if(pcmlen!=STREAM_BASE_PCM)return -1;
+  if(bnflen!=STREAM_ALL_BNF)return -2;
+  jmat_t* mpcm = jmat_alloc(STREAM_BASE_PCM,1,1,0,1,pcmbuf);
+  jmat_t* mbnf = jmat_alloc(STREAM_BASE_BNF,20,1,0,4,bnfbuf);
+  int rst = readnext(sessid,mpcm,mbnf);
+  jmat_free(mpcm);
+  jmat_free(mbnf);
+  return rst;
+}
+
+int PcmSession::readblock(uint64_t sessid,uint8_t* bnfbuf,int bnflen,int inx){
+  if(m_sessid!=sessid)return -2;
+  if(bnflen!=STREAM_ALL_BNF)return -2;
+  jmat_t* mbnf = jmat_alloc(STREAM_BASE_BNF,20,1,0,4,bnfbuf);
+  int rst = readblock(sessid,mbnf,inx);
+  jmat_free(mbnf);
+  return rst;
+
+}
+
+int PcmSession::readblock(uint64_t sessid,jmat_t* mbnf,int inx){
+  if(m_sessid!=sessid)return -2;
+  if(mbnf->width!=STREAM_BASE_BNF)return -2002; 
+  //if(inx>=m_calccnt)return -99;
+  //printf("===inx %d num %d\n",inx,m_numcalc);
+  if(inx>=m_numcalc)return -99;
+  int rst = 0;
+  PcmItem* curitem = NULL;
+  int newinx = 0;
+  if((inx<m_arrmax[m_readcnt])&&(inx>=m_arrmin[m_readcnt])){
+    curitem = vec_pcm[m_readcnt];
+    newinx = inx - m_arrmin[m_readcnt];
+  }else{
+    for(int k=0;k<m_calccnt;k++){
+      //printf("==k %d max %d min %d\n",k,m_arrmax[k],m_arrmin[k]);
+      if((inx<m_arrmax[k])&&(inx>=m_arrmin[k])){
+        curitem = vec_pcm[k];
+        m_readcnt = k;
+        newinx = inx - m_arrmin[k];
+        break;
+      }
+    }
+  }
+  //printf("===curitem %p inx %d new %d\n",curitem,inx ,newinx);
+  if(curitem){
+    rst = curitem->readblock(newinx,NULL,mbnf);
+    if(rst){
+      mbnf->buf.sessid = inx;
+    }
+    return rst;
+  }
+  return 0;
+}
+
+int PcmSession::readnext(uint64_t sessid,jmat_t* mpcm,jmat_t* mbnf){
+  if(mpcm->width!=STREAM_BASE_PCM)return -2001; 
+  if(mbnf->width!=STREAM_BASE_BNF)return -2002; 
+  //printf("===p %d r %d\n",m_totalpush,m_totalread);
+  if(m_totalread<m_totalpush){
+    //printf("===q %d r %d\n",m_readcnt,m_calccnt);
+    if(m_readcnt<m_calccnt){
+      PcmItem* item = vec_pcm[m_readcnt];      
+      int rst = item->readblock(mpcm,mbnf);
+      if(!rst){
+        m_readcnt++;
+        return 0;
+      }else{
+#ifdef PCMDEBUG
+        if(1){
+          char fn[255];
+          sprintf(fn,"out_%d.data",++m_debugout);
+          FILE* df = fopen(fn,"wb");
+          fwrite(mpcm->data,1,STREAM_BASE_PCM,df);
+          fclose(df);
+        }
+#endif
+        m_numread += 1;
+        m_totalread+=STREAM_BASE_PCM;
+        return item->itemsentid();
+      }
+    }else{
+      return 0;
+    }
+  }else{
+    return m_finished?-1:0;
+  }
+}
+
+
+
--- a/duix-sdk/src/main/cpp/dhmfcc/dhpcm.h
+++ b/duix-sdk/src/main/cpp/dhmfcc/dhpcm.h
@ -0,0 +1,168 @@
+                                                                                        ///*
+#pragma once
+#include "dh_data.h"
+#include "aicommon.h"
+#include <mutex>
+#include <vector>
+#include "dhwenet.h"
+#include "wenetai.h"
+
+//#define PCMDEBUG 1
+#define AIRUN_FLAG 1
+class PcmItem{
+  private:
+    uint64_t m_sessid = 0;
+    int   m_minoff = 0;
+    int   m_maxblock = 0;
+
+    int   pcm_allsamp = 0;
+    int   bnf_allcnt = 0;
+    int   mel_allcnt = 0;
+    jmat_t* m_wav = NULL;
+    jmat_t* m_pcm = NULL;
+    jmat_t* m_mel = NULL;
+    jmat_t* m_bnf = NULL;
+    jmat_t* m_bnfflip = NULL;
+    jmat_t* m_mfcc = NULL;
+    int   pcm_block = 0;
+    int   pcm_read = 0;
+    int   pre_block = 0;
+    uint64_t m_pcminx = 0;
+    //gjvad_t* m_vad = NULL;
+    int   m_ready = 0;
+    int m_sentid = 1;
+    int m_flip = 0;
+    int m_inx = 0;
+    jmat_t* mat_flip = NULL;
+  public:
+    int itemsentid(){return m_sentid;};
+    int blocks(){return pcm_block;};
+    int ready(){return m_ready;};
+    int finished(){return pcm_read>=pcm_block;};
+    int reset();
+    PcmItem(int sentid,int minoff  ,int maxblock ,int flip,int inx);
+    int fillPcm(uint64_t sessid,uint64_t tickinx,jmat_t* premat,jmat_t* mat);
+    int checkValid(uint64_t tickinx);
+    jmat_t* readlast(int minoff);
+    int runWenet(WeAI* weai);
+    int readblock(jmat_t* pcm,jmat_t* mfcc);
+    int readblock(int inx,jmat_t* pcm,jmat_t* mfcc);
+    int readbnf(char* buf);
+    int numblock();
+    int startinx(){return m_inx;};
+    int endinx(){return m_inx+pcm_block;};
+    int readblock();
+    void dump(FILE* dumpfile);
+    ~PcmItem();
+};
+
+class PcmFile{
+  private:
+    int         m_fps = 25;
+    int         m_adj = 0;
+    float       m_scale = 1.0f;
+    int         m_minoff = 0;
+    int         m_mincnt = 0;
+    int         m_maxcnt = 0;
+    int         m_minsize = 0;
+    int         m_maxsize = 0;
+
+    int         m_fileblock = 0;
+    int         m_calcblock = 0;
+    int         m_clrcnt = 0;
+    int         m_readcnt = 0;
+    int         m_calccnt = 0;
+    int         *m_arrmax = NULL;
+    int         *m_arrmin = NULL;
+    std::vector<PcmItem*>  vec_pcm ;
+    int       appenditem(jmat_t* mat,int noone=0);
+    PcmItem     *m_lastitem = NULL;
+    PcmItem     *m_lastread = NULL;
+    int         m_presize = 0;
+    int         m_preblock = 0;
+    PcmItem     *m_preitem = NULL;;
+    int         m_flip = 0;
+  public:
+    PcmFile(int fps = 25,int minoff = STREAM_BASE_MINOFF,int mincnt = STREAM_BASE_MINBLOCK,int maxcnt = STREAM_BASE_MAXBLOCK);
+    int setflip(int flip);
+    int prepare(std::string& pcmfn);
+    int prepare(char* buf,int size,char* prebuf = NULL,int presize = 0);
+    int itemSize();
+    int process(int inx,WeAI* ai);
+    int readblock(int sinx,jmat_t* pcm,jmat_t* feat);
+    jmat_t* readbnf(int sinx);
+    int readbnf(char* bnf,int bnfsize);
+    int fileBlock(){return m_fileblock*m_scale;};
+    int calcBlock(){return m_calcblock*m_scale;};
+    virtual ~PcmFile();
+};
+
+class PcmSession{
+  private:
+    int         m_sessid = 0;
+
+    int         m_minoff = 0;
+    int         m_mincnt = 0;
+    int         m_maxcnt = 0;
+    int         m_minsize = 0;
+    int         m_maxsize = 0;
+    //int         m_basesize = 0;
+    //int         m_firstsize = 0;
+
+    int         m_cachepos = 0;
+    int         m_cachemax = 0;
+    uint8_t      *m_pcmcache = NULL;
+
+    std::mutex  m_lock;
+    int         *m_arrflag;
+    int         m_curflag = 1;
+
+    std::vector<PcmItem*>  vec_pcm ;
+    PcmItem     *m_lastitem = NULL;
+
+    volatile int         m_clrcnt = 0;
+    volatile int         m_workcnt = 0;
+    volatile int         m_readcnt = 0;
+    volatile int         m_calccnt = 0;
+    int       appenditem(jmat_t* mat,int noone=0);
+
+    volatile int       m_totalpush = 0;
+    volatile int       m_totalread = 0;
+    volatile int       m_finished = 0;
+    int       m_first = 1;
+    int     m_debuginx = 0;
+    int     m_debugout = 0;
+    int     checkpcmcache(int flash=0);
+    int     m_numcalc = 0;
+    int     m_numread = 0;
+    int     m_numpush = 0;
+    int     distwait();
+    int     m_checkcnt = 0;
+    int     m_flip = 0;
+    int         *m_arrmax = NULL;
+    int         *m_arrmin = NULL;
+    int         m_fileblock = 0;
+    int         m_calcblock = 0;
+  public:
+    int setflip(int flip);
+    uint64_t sessid(){return m_sessid;};
+    int simppcm(uint64_t sessid,uint8_t* buf,int len);
+    int pushpcm(uint64_t sessid,uint8_t* buf,int len);
+    int finpcm(uint64_t sessid);
+    int conpcm(uint64_t sessid);
+    int runcalc(uint64_t sessid,WeAI* weai,int mincalc=1);
+    int runfirst(uint64_t sessid,WeAI* weai);
+    int readnext(uint64_t sessid,jmat_t* mpcm,jmat_t* mbnf);
+    int readnext(uint64_t sessid,uint8_t* pcmbuf,int pcmlen,uint8_t* bnfbuf,int bnflen);
+    int readblock(uint64_t sessid,jmat_t* mbnf,int index);
+    int readblock(uint64_t sessid,uint8_t* bnfbuf,int bnflen,int inx);
+    PcmSession(uint64_t sessid,int minoff = STREAM_BASE_MINOFF,int mincnt = STREAM_BASE_MINBLOCK,int maxcnt = STREAM_BASE_MAXBLOCK);
+    ~PcmSession();
+    void dump(char* dumpfn);
+    int first(){return m_first;};
+    int fileBlock(){return m_fileblock;};
+    //int calcBlock(){return m_calcblock;};
+    int calcBlock(){return m_numcalc;};
+};
+
+
--- a/duix-sdk/src/main/cpp/dhmfcc/dhwenet.cpp
+++ b/duix-sdk/src/main/cpp/dhmfcc/dhwenet.cpp
@ -0,0 +1,44 @@
+#include "dhwenet.h"
+#include <stdio.h>
+#include <vector>
+#include <string>
+#include "aicommon.h"
+#include "mfcc/mfcc.hpp"
+
+
+int DhWenet::cntmel(int pcmblock){
+  int allcnt = pcmblock + 2*STREAM_MFCC_FILL;
+  int pcm_allsamp = allcnt*STREAM_BASE_SAMP;
+  int mel_allcnt = pcm_allsamp/160+1;
+  return mel_allcnt;
+}
+
+int DhWenet::cntbnf(int melblock){
+  int bnf_allcnt = melblock*0.25f - 0.75f;
+  return bnf_allcnt;
+}
+
+int DhWenet::calcmfcc(float* fwav,float* mel2){
+    int rst = 0;
+    int melcnt = MFCC_WAVCHUNK/160+1;
+    rst = log_mel(fwav,MFCC_WAVCHUNK, 16000,mel2);
+    return rst;
+}
+
+int DhWenet::calcmfcc(float* fwav,int fsample,float* mel2,int melcnt){
+    int rst = 0;
+    rst = log_mel(fwav,fsample, 16000,mel2);
+    return rst;
+}
+
+int DhWenet::calcmfcc(jmat_t* mwav,jmat_t* mmel){
+    int rst = 0;
+    int melcnt = MFCC_WAVCHUNK/160+1;
+    for(size_t k=0;k<mwav->height;k++){
+        float* fwav = (float*)jmat_row(mwav,k);
+        float* mel2 = (float*)jmat_row(mmel,k);
+        rst = log_mel(fwav,MFCC_WAVCHUNK, 16000,mel2);
+    }
+    return rst;
+}
+
--- a/duix-sdk/src/main/cpp/dhmfcc/dhwenet.h
+++ b/duix-sdk/src/main/cpp/dhmfcc/dhwenet.h
@ -0,0 +1,14 @@
+#pragma once
+#include "dh_data.h"
+#include "wenetai.h"
+#include <mutex> 
+
+class DhWenet{
+    public:
+        static int calcmfcc(jmat_t* mwav,jmat_t* mmel);
+        static int calcmfcc(float* fwav,float* mel2);
+        static int calcmfcc(float* fwav,int fsample,float* mel2,int melcnt);
+        static int cntmel(int pcmblock);
+        static int cntbnf(int melblock);
+
+};
--- a/duix-sdk/src/main/cpp/dhmfcc/iir_filter.cpp
+++ b/duix-sdk/src/main/cpp/dhmfcc/iir_filter.cpp
@ -0,0 +1,310 @@
+#include "mfcc/iir_filter.hpp"
+#include <stdio.h>
+
+#ifdef UES_IIR_I
+
+void IIR_I::reset()
+{
+    for(int i = 0; i <= m_num_order; i++)
+    {
+        m_pNum[i] = 0.0;
+    }
+    for(int i = 0; i <= m_den_order; i++)
+    {
+        m_pDen[i] = 0.0;
+    }
+}
+IIR_I::IIR_I()
+{
+    m_pNum = NULL;
+    m_pDen = NULL;
+    m_px = NULL;
+    m_py = NULL;
+    m_num_order = -1;
+    m_den_order = -1;
+};
+IIR_I::~IIR_I()
+{
+    delete[] m_pNum;
+    delete[] m_pDen;
+    delete[] m_px;
+    delete[] m_py;
+    m_pNum = NULL;
+    m_pDen = NULL;
+    m_px = NULL;
+    m_py = NULL;
+};
+
+/** \brief
+ *
+ * \param num 分子多项式的系数，升序排列,num[0] 为常数项
+ * \param m 分子多项式的阶数
+ * \param den 分母多项式的系数，升序排列,den[0] 为常数项
+ * \param m 分母多项式的阶数
+ * \return
+ */
+void IIR_I::setPara(double num[], int num_order, double den[], int den_order)
+{
+    delete[] m_pNum;
+    delete[] m_pDen;
+    delete[] m_px;
+    delete[] m_py;
+    m_pNum = new double[num_order + 1];
+    m_pDen = new double[den_order + 1];
+    m_num_order = num_order;
+    m_den_order = den_order;
+    m_px = new double[num_order + 1];
+    m_py = new double[den_order + 1];
+    for(int i = 0; i < m_num_order; i++)
+    {
+        m_pNum[i] = num[i];
+        m_px[i] = 0.0;
+    }
+    m_pNum[m_num_order] = 0.0;
+    m_px[m_num_order] = 0.0;
+    for(int i = 0; i < m_den_order; i++)
+    {
+        m_pDen[i] = den[i];
+        m_py[i] = 0.0;
+    }
+    m_pDen[m_den_order] = 0.0;
+    m_py[m_den_order] = 0.0;
+}
+
+/** \brief 计算 IIR 滤波器的时域响应，不影响滤波器的内部状态
+ * \param data_in 为滤波器的输入，0 时刻之前的输入默认为 0，data_in[M] 及之后的输入默认为data_in[M-1]
+ * \param data_out 滤波器的输出
+ * \param M 输入数据的长度
+ * \param N 输出数据的长度
+ * \return
+ */
+void IIR_I::resp(double data_in[], int M, double data_out[], int N)
+{
+    int i, k, il;
+    for(k = 0; k < N; k++)
+    {
+        data_out[k] = 0.0;
+        for(i = 0; i <= m_num_order; i++)
+        {
+            if( k - i >= 0)
+            {
+                il = ((k - i) < M) ? (k - i) : (M - 1);
+                data_out[k] = data_out[k] + m_pNum[i] * data_in[il];
+            }
+        }
+        for(i = 1; i <= m_den_order; i++)
+        {
+            if( k - i >= 0)
+            {
+                data_out[k] = data_out[k] - m_pDen[i] * data_out[k - i];
+            }
+        }
+    }
+}
+
+/** \brief 滤波函数，采用直接I型结构
+ * 注：该函数内部修改过，移植librosa.pcen时参照scipy.signal.lfilter所做的设计。
+ *
+ * \param data_in[] 输入数据
+ * \param data_out[] 保存滤波后的数据
+ * \param len 数组的长度
+ * \return
+ */
+void IIR_I::filter(double data_in[], double data_out[], int len)
+{
+    int i, k;
+    m_py[1] = 1; //修改的地方，因为公式中y[n-k]，当为第一个元素时会出现y[-1]，pcen中y[-1]会被认为为1。
+    for(k = 0; k < len; k++)
+    {
+        m_px[0] = data_in[k];
+        m_py[0] = 0.0;
+        for(i = 0; i <= m_num_order; i++)
+        {
+            m_py[0] = m_py[0] + m_pNum[i] * m_px[i];
+        }
+        for(i = 1; i <= m_den_order; i++)
+        {
+            m_py[0] = m_py[0] - m_pDen[i] * m_py[i];
+        }
+        for(i = m_num_order; i >= 1; i--)
+        {
+            m_px[i] = m_px[i-1];
+        }
+        for(i = m_den_order; i >= 1; i--)
+        {
+            m_py[i] = m_py[i-1];
+        }
+        data_out[k] = m_py[0];
+    }
+}
+
+#endif
+
+#ifdef UES_IIR_II
+
+
+IIR_II::IIR_II()
+{
+//ctor
+m_pNum = NULL;
+m_pDen = NULL;
+m_pW = NULL;
+m_num_order = -1;
+m_den_order = -1;
+m_N = 0;
+};
+
+void IIR_II::reset()
+{
+    for(int i = 0; i < m_N; i++)
+    {
+        m_pW[i] = 0.0;
+    }
+}
+/** \brief
+ *
+ * \param num 分子多项式的系数，升序排列,num[0] 为常数项
+ * \param m 分子多项式的阶数
+ * \param den 分母多项式的系数，升序排列,den[0] 为常数项
+ * \param m 分母多项式的阶数
+ * \return
+ */
+void IIR_II::setPara(double num[], int num_order, double den[], int den_order)
+{
+    delete[] m_pNum;
+    delete[] m_pDen;
+    delete[] m_pW;
+    m_num_order = num_order;
+    m_den_order = den_order;
+    m_N = fmax(num_order, den_order) + 1;
+    m_pNum = new double[m_N];
+    m_pDen = new double[m_N];
+    m_pW = new double[m_N];
+    for(int i = 0; i < m_N; i++)
+    {
+        m_pNum[i] = 0.0;
+        m_pDen[i] = 0.0;
+        m_pW[i] = 0.0;
+    }
+    for(int i = 0; i <= num_order; i++)
+    {
+        m_pNum[i] = num[i];
+    }
+    for(int i = 0; i <= den_order; i++)
+    {
+        m_pDen[i] = den[i];
+    }
+}
+/** \brief 计算 IIR 滤波器的时域响应，不影响滤波器的内部状态
+ * \param data_in 为滤波器的输入，0 时刻之前的输入默认为 0，data_in[M] 及之后的输入默认为data_in[M-1]
+ * \param data_out 滤波器的输出
+ * \param M 输入数据的长度
+ * \param N 输出数据的长度
+ * \return
+ */
+void IIR_II::resp(double data_in[], int M, double data_out[], int N)
+{
+    int i, k, il;
+    for(k = 0; k < N; k++)
+    {
+        data_out[k] = 0.0;
+        for(i = 0; i <= m_num_order; i++)
+        {
+            if( k - i >= 0)
+            {
+                il = ((k - i) < M) ? (k - i) : (M - 1);
+                data_out[k] = data_out[k] + m_pNum[i] * data_in[il];
+            }
+        }
+        for(i = 1; i <= m_den_order; i++)
+        {
+            if( k - i >= 0)
+            {
+                data_out[k] = data_out[k] - m_pDen[i] * data_out[k - i];
+            }
+        }
+    }
+}
+/** \brief 滤波函数，采用直接II型结构
+ *
+ * \param data 输入数据
+ * \return 滤波后的结果
+ */
+double IIR_II::filter(double data)
+{
+    m_pW[0] = data;
+    for(int i = 1; i <= m_den_order; i++) // 先更新 w[n] 的状态
+    {
+        m_pW[0] = m_pW[0] - m_pDen[i] * m_pW[i];
+    }
+    data = 0.0;
+    for(int i = 0; i <= m_num_order; i++)
+    {
+        data = data + m_pNum[i] * m_pW[i];
+    }
+    for(int i = m_N - 1; i >= 1; i--)
+    {
+        m_pW[i] = m_pW[i-1];
+    }
+    return data;
+}
+/** \brief 滤波函数，采用直接II型结构
+ *
+ * \param data[] 传入输入数据，返回时给出滤波后的结果
+ * \param len data[] 数组的长度
+ * \return
+ */
+void IIR_II::filter(double data[], int len)
+{
+    int i, k;
+    for(k = 0; k < len; k++)
+    {
+        m_pW[0] = data[k];
+        for(i = 1; i <= m_den_order; i++) // 先更新 w[n] 的状态
+        {
+            m_pW[0] = m_pW[0] - m_pDen[i] * m_pW[i];
+        }
+        data[k] = 0.0;
+        for(i = 0; i <= m_num_order; i++)
+        {
+            data[k] = data[k] + m_pNum[i] * m_pW[i];
+        }
+
+        for(i = m_N - 1; i >= 1; i--)
+        {
+            m_pW[i] = m_pW[i-1];
+        }
+    }
+}
+/** \brief 滤波函数，采用直接II型结构
+ *
+ * \param data_in[] 输入数据
+ * \param data_out[] 保存滤波后的数据
+ * \param len 数组的长度
+ * \return
+ */
+void IIR_II::filter(double data_in[], double data_out[], int len)
+{
+    int i, k;
+    for(k = 0; k < len; k++)
+    {
+        m_pW[0] = data_in[k];
+        for(i = 1; i <= m_den_order; i++) // 先更新 w[n] 的状态
+        {
+            m_pW[0] = m_pW[0] - m_pDen[i] * m_pW[i];
+        }
+        data_out[k] = 0.0;
+        for(i = 0; i <= m_num_order; i++)
+        {
+            data_out[k] = data_out[k] + m_pNum[i] * m_pW[i];
+        }
+
+        for(i = m_N - 1; i >= 1; i--)
+        {
+            m_pW[i] = m_pW[i-1];
+        }
+    }
+}
+
+#endif
+
--- a/duix-sdk/src/main/cpp/dhmfcc/mfcc.cpp
+++ b/duix-sdk/src/main/cpp/dhmfcc/mfcc.cpp
@ -0,0 +1,369 @@
+#include "mfcc/mfcc.hpp"
+#include "mfcc/AudioFFT.hpp"
+#include "mfcc/iir_filter.hpp"
+#include "opencv2/core.hpp"
+
+static int nSamplesPerSec = 16000;
+static int length_DFT = 1024;//2048;
+static int hop_length = 160;//int(0.05 * nSamplesPerSec);
+static int win_length = 800;// int(0.1 * nSamplesPerSec);
+static int number_filterbanks = 80;
+static float preemphasis = 0.97;
+static int max_db = 100;
+static int ref_db = 20;
+static int r = 1;
+static double pi = 3.14159265358979323846;
+
+static cv::Mat_<double> mel_basis;
+static cv::Mat_<float> hannWindow;
+
+static std::shared_ptr<IIR_I> filter;
+
+//"""Convert Hz to Mels"""
+static double hz_to_mel(double frequencies, bool htk = false) {
+    if (htk) {
+        return 2595.0 * log10(1.0 + frequencies / 700.0);
+    }
+    // Fill in the linear part
+    double f_min = 0.0;
+    double f_sp = 200.0 / 3;
+    double mels = (frequencies - f_min) / f_sp;
+    // Fill in the log-scale part
+    double min_log_hz = 1000.0;                         // beginning of log region (Hz)
+    double min_log_mel = (min_log_hz - f_min) / f_sp;   // same (Mels)
+    double logstep = log(6.4) / 27.0;              // step size for log region
+
+    // 对照Python平台的librosa库，移植
+    //如果是多维数列
+//    if (frequencies.ndim) {
+//        // If we have array data, vectorize
+//        log_t = (frequencies >= min_log_hz)
+//        mels[log_t] = min_log_mel + np.log(frequencies[log_t] / min_log_hz) / logstep
+//    } else
+    if (frequencies >= min_log_hz) {
+        // If we have scalar data, heck directly
+        mels = min_log_mel + log(frequencies / min_log_hz) / logstep;
+    }
+    return mels;
+}
+
+//"""Convert mel bin numbers to frequencies"""
+static cv::Mat_<double> mel_to_hz(cv::Mat_<double> mels, bool htk = false) {
+//    if (htk) {
+//        return //python://700.0 * (10.0**(mels / 2595.0) - 1.0);
+//    }
+    // Fill in the linear scale
+    double f_min = 0.0;
+    double f_sp = 200.0 / 3;
+    cv::Mat_<double> freqs = mels * f_sp + f_min;
+    // And now the nonlinear scale
+    double min_log_hz = 1000.0;                         // beginning of log region (Hz)
+    double min_log_mel = (min_log_hz - f_min) / f_sp;   // same (Mels)
+    double logstep = log(6.4) / 27.0;              // step size for log region
+    // 对照Python平台的librosa库，移植
+    //if (mels.ndim) {
+    // If we have vector data, vectorize
+    cv::Mat_<bool> log_t = (mels >= min_log_mel);
+    for (int i = 0; i < log_t.cols; i++) {
+        if (log_t(0, i)) {
+            freqs(0, i) = cv::exp((mels(0, i) - min_log_mel) * logstep) * min_log_hz;
+        }
+    }
+    //}
+    return freqs;
+}
+
+static cv::Mat_<double> cvlinspace(double min_, double max_, int length) {
+    auto cvmat = cv::Mat_<double>(1, length);
+    for (int i = 0; i < length; i++) {
+        cvmat(0, i) = ((max_ - min_) / (length - 1) * i) + min_;
+    }
+    return cvmat;
+}
+
+//"""Create a Filterbank matrix to combine FFT bins into Mel-frequency bins"""
+static cv::Mat_<double> mel_spectrogram_create(int nps, int n_fft, int n_mels) {
+    double f_max = nps / 2.0;
+    double f_min = 0;
+    int n_fft_2 = 1 + n_fft / 2;
+    // Initialize the weights
+    //auto weights = nc::zeros<double>(nc::uint32(n_mels), nc::uint32(n_fft_2));
+    auto weights = cv::Mat_<double>(n_mels, n_fft_2, 0.0);
+    // Center freqs of each FFT bin
+    //auto fftfreqs_ = nc::linspace<double>(f_min, f_max, nc::uint32(n_fft_2), true);
+    auto fftfreqs = cvlinspace(f_min, f_max, n_fft_2);
+
+    // 'Center freqs' of mel bands - uniformly spaced between limits
+    double min_mel = hz_to_mel(f_min, false);
+    double max_mel = hz_to_mel(f_max, false);
+    //auto mels_ = nc::linspace(min_mel, max_mel, nc::uint32(n_mels + 2));
+    auto mels = cvlinspace(min_mel, max_mel, n_mels + 2);
+    auto mel_f = mel_to_hz(mels, false);
+
+    //auto fdiff_ = nc::diff(mel_f_); //沿着指定轴计算第N维的离散差值(后一个元素减去前一个元素)
+    cv::Mat_<double> d1(1, mel_f.cols * mel_f.rows - 1, (double *) (mel_f.data) + 1);
+    cv::Mat_<double> d2(1, mel_f.cols * mel_f.rows - 1, (double *) (mel_f.data));
+    cv::Mat_<double> fdiff = d1 - d2;
+
+    //auto ramps = nc::subtract.outer(mel_f, fftfreqs); //nc没有subtract.outer
+    //nc::NdArray<double> ramps = nc::zeros<double>(mel_f.cols, fftfreqs.cols);
+    auto ramps = cv::Mat_<double>(mel_f.cols, fftfreqs.cols);
+    for (int i = 0; i < mel_f.cols; i++) {
+        for (int j = 0; j < fftfreqs.cols; j++) {
+            ramps(i, j) = mel_f(0, i) - fftfreqs(0, j);
+        }
+    }
+
+    for (int i = 0; i < n_mels; i++) {
+        // lower and upper slopes for all bins
+        //auto ramps_1 = nc::NdArray<double>(1, ramps.cols);
+        auto ramps_1 = cv::Mat_<double>(1, ramps.cols);
+        for (int j = 0; j < ramps.cols; j++) {
+            ramps_1(0, j) = ramps(i, j);
+        }
+        //auto ramps_2 = nc::NdArray<double>(1, ramps.cols);
+        auto ramps_2 = cv::Mat_<double>(1, ramps.cols);
+        for (int j = 0; j < ramps.cols; j++) {
+            ramps_2(0, j) = ramps(i + 2, j);
+        }
+        cv::Mat_<double> lower = ramps_1 * -1 / fdiff(0, i);
+        cv::Mat_<double> upper = ramps_2 / fdiff(0, i + 1);
+        // .. then intersect them with each other and zero
+        //auto weights_1 = nc::maximum(nc::zeros<double>(1, ramps.cols), nc::minimum(lower, upper));
+        cv::Mat weights_1 = cv::Mat_<double>(1, lower.cols);
+
+        cv::Mat c1 = lower;//(cv::Mat_<double>(1,5) << 1,2,-3,4,-5);
+        cv::Mat c2 = upper;
+        cv::min(c1, c2, weights_1);
+        cv::max(weights_1, 0, weights_1);
+
+        for (int j = 0; j < n_fft_2; j++) {
+            /*
+            double da = lower(0,j);
+            double db = upper(0,j);
+            double dc = da>db?db:da;
+            if(dc<0)dc = 0;
+            weights(i, j) = dc;//weights_1.at<double_t>(0, j);
+            */
+            weights(i, j) = weights_1.at<double_t>(0, j);
+        }
+    }
+
+    // Slaney-style mel is scaled to be approx constant energy per channel
+    auto enorm = cv::Mat_<double>(1, n_mels);
+    for (int j = 0; j < n_mels; j++) {
+        enorm(0, j) = 2.0 / (mel_f(0, j + 2) - mel_f(0, j));
+    }
+    for (int j = 0; j < n_mels; j++) {
+        for (int k = 0; k < n_fft_2; k++) {
+            weights(j, k) *= enorm(0, j);
+        }
+    }
+    return weights;
+}
+
+//"""Short-time Fourier transform (STFT)""": 默认center=True, window='hann', pad_mode='reflect'
+static cv::Mat_<double> MagnitudeSpectrogram(const cv::Mat_<float> *emphasis_data, int n_fft = 2048, int hop_length = 0, int win_length = 0) {
+    if (win_length == 0) {
+        win_length = n_fft;
+    }
+    if (hop_length == 0) {
+        hop_length = win_length / 4;
+    }
+
+    int pad_lenght = n_fft / 2;
+    cv::Mat_<float> cv_padbuffer;
+    cv::copyMakeBorder(*emphasis_data, cv_padbuffer, 0, 0, pad_lenght, pad_lenght, cv::BORDER_REFLECT_101);
+
+    if (hannWindow.empty()) {
+        hannWindow = cv::Mat_<float>(1, n_fft, 0.0f);
+        int insert_cnt = 0;
+        if (n_fft > win_length) {
+            insert_cnt = (n_fft - win_length) / 2;
+        } else {
+            //std::cout << "\tn_fft:" << n_fft << " > win_length:" << n_fft << std::endl;
+            return cv::Mat_<double>(0, 0);
+        }
+        for (int k = 1; k <= win_length; k++) {
+            hannWindow(0, k - 1 + insert_cnt) = float(0.5 * (1 - cos(2 * pi * k / (win_length + 1))));
+        }
+    }
+    int size = cv_padbuffer.rows * cv_padbuffer.cols;//padbuffer.size()
+    int number_feature_vectors = (size - n_fft) / hop_length + 1;
+    int number_coefficients = n_fft / 2 + 1;
+    cv::Mat_<float> feature_vector(number_feature_vectors, number_coefficients, 0.0f);
+
+    audiofft::AudioFFT fft;
+    fft.init(size_t(n_fft));
+    for (int i = 0; i <= size - n_fft; i += hop_length) {
+        cv::Mat_<float> framef = cv::Mat_<float>(1, n_fft, (float *) (cv_padbuffer.data) + i).clone();
+        framef = framef.mul(hannWindow);
+
+        cv::Mat_<float> Xrf(1, number_coefficients);
+        cv::Mat_<float> Xif(1, number_coefficients);
+        fft.fft((float *) (framef.data), (float *) (Xrf.data), (float *) (Xif.data));
+
+        cv::pow(Xrf, 2, Xrf);
+        cv::pow(Xif, 2, Xif);
+        cv::Mat_<float> cv_feature(1, number_coefficients, &(feature_vector[i / hop_length][0]));
+        cv::sqrt(Xrf + Xif, cv_feature);
+    }
+    cv::Mat_<float> cv_mag;
+    cv::transpose(feature_vector, cv_mag);
+    cv::Mat_<double> mag;
+    cv_mag.convertTo(mag, CV_64FC1);
+
+    return mag;
+}
+
+//cv::Mat_<double> log_mel(std::vector<uint8_t> &ifile_data, int nSamples_per_sec) {
+int log_mel(float* ifile_data, int ifile_length,int nSamples_per_sec,float* ofile_data) {
+    if (nSamples_per_sec != nSamplesPerSec) {
+        return -1;//cv::Mat_<double>(0, 0);
+    }
+    cv::Mat_<float> d1(1, ifile_length - 1, (float *) (ifile_data) + 1);
+    cv::Mat_<float> d2(1, ifile_length-1 , (float *) (ifile_data));
+
+    cv::Mat_<float> cv_emphasis_data;
+
+    cv::hconcat(cv::Mat_<float>::zeros(1, 1), d1 - d2 * preemphasis, cv_emphasis_data);
+    auto mag = MagnitudeSpectrogram(&cv_emphasis_data, length_DFT, hop_length, win_length);
+    auto magb = cv::abs(mag);
+    cv::pow(magb,2,mag);
+
+    //tooken
+    if (mel_basis.empty()) {
+        mel_basis = mel_spectrogram_create(nSamplesPerSec, length_DFT, number_filterbanks);
+    }
+
+    cv::Mat cv_mel = mel_basis * mag;
+    cv::log(cv_mel+ 1e-5, cv_mel);
+    cv_mel = cv_mel / 2.3025850929940459 * 10; // 2.3025850929940459=log(10)
+
+    cv_mel = cv_mel - ref_db;
+    cv::Mat cv_mel_r;//(cv_mel.cols,cv_mel.rows,CV_64FC1,ofile_data);
+    cv::transpose(cv_mel, cv_mel_r);
+    //cv::Mat rcv(cv_mel_r.cols,cv_mel_r.rows, CV_32FC1,ofile_data);
+    cv::Mat rrr(cv_mel.cols,cv_mel.rows,CV_32FC1,ofile_data);
+    cv_mel_r.convertTo(rrr, CV_32FC1);
+
+    if (r == 1) {
+        // 原计算公式是：
+        // mel = mel[:len(mel) // hp.r * hp.r].reshape([len(mel) // hp.r, hp.r * hp.n_mels])
+        // 当r=1的时候公式运算无任何数值改变。
+    } else {
+        //std::cout << R"(the "r" is not 1.)" << std::endl;
+    }
+    return 0;
+}
+
+/**--------------------------------- 以下是pcen运算方法 ---------------------------------**/
+
+// scipy.signal.lfilter_zi()
+static cv::Mat_<double> cvlfilter_zi(cv::Mat_<double> b, cv::Mat_<double> a) {
+    if ((b.rows != 1) || (a.rows != 1)) {
+        //std::cout << "Numerator b and Denominator a must be 1-D." << std::endl;
+    }
+    if (a(0, 0) != 1) {
+        // Normalize the coefficients so a[0] == 1.
+        b = b / a(0, 0);
+        a = a / a(0, 0);
+    }
+    int len_a = a.cols * a.rows;
+    int len_b = b.cols * b.rows;
+    int n = len_a > len_b ? len_a : len_b;
+    if (len_a < n) {
+        cv::hconcat(a, cv::Mat_<float>::zeros(1, n - len_a), a);
+    } else if (len_b < n) {
+        cv::hconcat(b, cv::Mat_<float>::zeros(1, n - len_b), b);
+    }
+    return cv::Mat_<double>(0, 0);
+}
+/*
+// scipy.signal.lfilter()
+// Filter data along one-dimension with an IIR or FIR filter.
+cv::Mat_<double> cvlfilter(cv::Mat_<double> &b, cv::Mat_<double> &a, cv::Mat_<double> &x,
+                           cv::Mat_<double> &zi, int axis = -1) {
+    if (a.rows * a.cols == 1) {
+        // This path only supports types fdgFDGO to mirror _linear_filter below.
+        // Any of b, a, x, or zi can set the dtype, but there is no default
+        // casting of other types; instead a NotImplementedError is raised.
+        // 后续如果需要，则进行补充
+    } else {
+        // return sigtools._linear_filter(b, a, x, axis, zi)
+        // sigtools._linear_filter()
+        // (y,Vf) = _linear_filter(b,a,X,Dim=-1,Vi=None)  implemented using Direct Form II transposed flow diagram.
+        // If Vi is not given, Vf is not returned.
+        ;
+    }
+}
+*/
+/*********************************************
+ * 名称：pcen
+ * 功能：传入音频数据，输出pcen方式提取的特征数据。
+ * 参数：@ifile_data        传入的音频数据
+ *      @nSamples_per_sec  音频采样率
+ * 返回：cv::Mat_<double>   特征数据
+*********************************************/
+static cv::Mat_<double> pcen(std::vector<uint8_t> &ifile_data, int nSamples_per_sec) {
+    //if (!(&ifile_data) || ifile_data.empty()) {
+    if (ifile_data.empty()) {
+        //std::cout << "error: invalid paramter: ifile_data" << std::endl;
+        return cv::Mat_<double>(0, 0);
+    }
+    if (nSamples_per_sec != nSamplesPerSec) {
+//        std::cout << R"(error: the "nSamples_per_sec" is not 16000.)" << std::endl;
+        return cv::Mat_<double>(0, 0);
+    }
+    int ifile_length = int(ifile_data.size() / 4);
+    cv::Mat_<float> cv_emphasis_data(1, ifile_length, (float *) (ifile_data.data()));
+//    std::cout<<ifile_length<<"====="<<cv_emphasis_data[0][960000-1]<<std::endl;
+    //getchar();
+
+    // magnitude spectrogram 幅度谱图
+    auto mag = MagnitudeSpectrogram(&cv_emphasis_data, length_DFT, hop_length, win_length);
+    mag = cv::abs(mag) * std::pow(2, 31);
+
+    // 生成梅尔谱图 mel spectrogram       //3ms
+    if (mel_basis.empty()) {
+        mel_basis = mel_spectrogram_create(nSamplesPerSec, length_DFT, number_filterbanks);
+    }
+
+    // doc
+    cv::Mat_<double> mel = mel_basis * mag;
+
+#if 1 
+    if (!filter) {
+        filter = std::make_shared<IIR_I>();
+        double iir_b[1] = {0.05638943879134889};
+        double iir_a[2] = {1.0, -0.9436105612086512};
+        //filter.reset();
+        filter->setPara(iir_b, 1, iir_a, 2);
+    }
+    cv::Mat_<double> S_smooth = cv::Mat_<double>(mel.rows, mel.cols);
+    for (int i = 0; i < mel.rows; i++) {
+        filter->filter(mel[i], S_smooth[i], mel.cols);
+    }
+
+#endif
+    double gain = 0.98;
+    double bias = 2.0;
+    double power = 0.5;
+    double eps = 1e-6;
+    //python: smooth = np.exp(-gain * (np.log(eps) + np.log1p(S_smooth / eps)))
+    cv::Mat_<double> S_smooth_log1p;
+    cv::log(S_smooth / eps + 1, S_smooth_log1p);
+    cv::Mat_<double> smooth;
+    cv::exp((S_smooth_log1p + cv::log(eps)) * (-gain), smooth);
+    //python: S_out = (bias ** power) * np.expm1(power * np.log1p(ref * smooth / bias))
+    cv::Mat_<double> smooth_log1p;
+    cv::Mat_<double> smooth_log1p_exp;
+    cv::log(mel.mul(smooth) / bias + 1, smooth_log1p);
+    cv::exp(power * smooth_log1p, smooth_log1p_exp);
+    cv::Mat_<double> S_out = (smooth_log1p_exp - 1) * pow(bias, power);
+    // transpose
+    cv::Mat_<double> pcen;
+    cv::transpose(S_out, pcen);
+
+    return pcen;
+}
--- a/duix-sdk/src/main/cpp/dhmfcc/mfcc/AudioFFT.hpp
+++ b/duix-sdk/src/main/cpp/dhmfcc/mfcc/AudioFFT.hpp
@ -0,0 +1,120 @@
+#pragma once
+
+#ifndef _AUDIOFFT_H
+#define _AUDIOFFT_H
+
+
+
+#include <cstddef>
+#include <memory>
+#include <cassert>
+#include <cmath>
+#include <cstring>
+
+//#define AUDIOFFT_APPLE_ACCELERATE //AUDIOFFT_INTEL_IPP//AUDIOFFT_FFTW3//AUDIOFFT_APPLE_ACCELERATE
+
+#if defined(AUDIOFFT_INTEL_IPP)
+#define AUDIOFFT_INTEL_IPP_USED
+  #include <ipp.h>
+#elif defined(AUDIOFFT_APPLE_ACCELERATE)
+#define AUDIOFFT_APPLE_ACCELERATE_USED
+  #include <Accelerate/Accelerate.h>
+  #include <vector>
+#elif defined (AUDIOFFT_FFTW3)
+#define AUDIOFFT_FFTW3_USED
+  #include <fftw3.h>
+#else
+#if !defined(AUDIOFFT_OOURA)
+#define AUDIOFFT_OOURA
+#endif
+#define AUDIOFFT_OOURA_USED
+#include <vector>
+#endif
+
+namespace audiofft
+{
+
+    namespace detail
+    {
+        class AudioFFTImpl;
+    }
+
+    /**
+     * @class AudioFFT
+     * @brief Performs 1D FFTs
+     */
+    class AudioFFT
+    {
+    public:
+        /**
+         * @brief Constructor
+         */
+        AudioFFT();
+
+        AudioFFT(const AudioFFT&) = delete;
+        AudioFFT& operator=(const AudioFFT&) = delete;
+
+        /**
+         * @brief Destructor
+         */
+        ~AudioFFT();
+
+        /**
+         * @brief Initializes the FFT object
+         * @param size Size of the real input (must be power 2)
+         */
+        void init(size_t size);
+
+        /**
+         * @brief Performs the forward FFT
+         * @param data The real input data (has to be of the length as specified in init())
+         * @param re The real part of the complex output (has to be of length as returned by ComplexSize())
+         * @param im The imaginary part of the complex output (has to be of length as returned by ComplexSize())
+         */
+        void fft(const float* data, float* re, float* im);
+
+        /**
+         * @brief Performs the inverse FFT
+         * @param data The real output data (has to be of the length as specified in init())
+         * @param re The real part of the complex input (has to be of length as returned by ComplexSize())
+         * @param im The imaginary part of the complex input (has to be of length as returned by ComplexSize())
+         */
+        void ifft(float* data, const float* re, const float* im);
+
+        /**
+         * @brief Calculates the necessary size of the real/imaginary complex arrays
+         * @param size The size of the real data
+         * @return The size of the real/imaginary complex arrays
+         */
+        static size_t ComplexSize(size_t size);
+
+    private:
+        std::unique_ptr<detail::AudioFFTImpl> _impl;
+    };
+
+
+    /**
+     * @deprecated
+     * @brief Let's keep an AudioFFTBase type around for now because it has been here already in the 1st version in order to avoid breaking existing code.
+     */
+    typedef AudioFFT AudioFFTBase;
+
+    namespace detail
+    {
+        class AudioFFTImpl
+        {
+        public:
+            AudioFFTImpl() = default;
+            AudioFFTImpl(const AudioFFTImpl&) = delete;
+            AudioFFTImpl& operator=(const AudioFFTImpl&) = delete;
+            virtual ~AudioFFTImpl() = default;
+            virtual void init(size_t size) = 0;
+            virtual void fft(const float* data, float* re, float* im) = 0;
+            virtual void ifft(float* data, const float* re, const float* im) = 0;
+        };
+    }
+
+} // End of namespace
+
+
+#endif // Header guard
--- a/duix-sdk/src/main/cpp/dhmfcc/mfcc/iir_filter.hpp
+++ b/duix-sdk/src/main/cpp/dhmfcc/mfcc/iir_filter.hpp
@ -0,0 +1,69 @@
+#pragma once
+
+#ifndef SERVICESUPERVISOR_IIR_FILTER_H
+#define SERVICESUPERVISOR_IIR_FILTER_H
+
+//E(t,f) is computed using a first-order in-finite impulse response (IIR) filter
+#define UES_IIR_I
+//#define UES_IIR_II
+
+#ifdef UES_IIR_I
+
+class IIR_I
+{
+private:
+    double *m_pNum;
+    double *m_pDen;
+    double *m_px;
+    double *m_py;
+    int m_num_order;
+    int m_den_order;
+public:
+    IIR_I();
+    ~IIR_I();
+    void reset();
+    void setPara(double num[], int num_order, double den[], int den_order);
+    void resp(double data_in[], int m, double data_out[], int n);
+    void filter(double data_in[], double data_out[], int len);
+};
+
+#endif
+
+#ifdef UES_IIR_II
+class IIR_II
+{
+public:
+    IIR_II();
+    void reset();
+    void setPara(double num[], int num_order, double den[], int den_order);
+    void resp(double data_in[], int m, double data_out[], int n);
+    double filter(double data);
+    void filter(double data[], int len);
+    void filter(double data_in[], double data_out[], int len);
+protected:
+private:
+    double *m_pNum;
+    double *m_pDen;
+    double *m_pW;
+    int m_num_order;
+    int m_den_order;
+    int m_N;
+};
+
+class IIR_BODE
+{
+private:
+    double *m_pNum;
+    double *m_pDen;
+    int m_num_order;
+    int m_den_order;
+    std::complex<double> poly_val(double p[], int order, double omega);
+public:
+    IIR_BODE();
+    void setPara(double num[], int num_order, double den[], int den_order);
+    std::complex<double> bode(double omega);
+    void bode(double omega[], int n, std::complex<double> resp[]);
+};
+#endif
+
+#endif
--- a/duix-sdk/src/main/cpp/dhmfcc/mfcc/mfcc.hpp
+++ b/duix-sdk/src/main/cpp/dhmfcc/mfcc/mfcc.hpp
@ -0,0 +1,7 @@
+#pragma once
+
+//#include"../third/numcpp/NumCpp.hpp"
+//#include "sas_util.h"
+
+
+int log_mel(float* ifile_data, int ifile_length,int nSamples_per_sec,float* ofile_data) ;
--- a/duix-sdk/src/main/cpp/dhmfcc/mfcc/sas_util.h
+++ b/duix-sdk/src/main/cpp/dhmfcc/mfcc/sas_util.h
@ -0,0 +1,120 @@
+#pragma once
+
+#include <string>
+#include <chrono>
+#include <vector>
+#include <assert.h>
+#include <memory>
+#include <fstream>
+#include "opencv2/core.hpp"
+
+//using namespace std;
+//using namespace std::chrono;
+
+class parambase
+{
+public:
+    std::string name;
+    std::string help;
+    std::string strval;
+    parambase(){}
+    virtual  ~parambase(){}
+    virtual bool set(const char* value) {return true;};
+};
+
+/**
+ */
+class EnginePar
+{
+public:
+    static int cs_timeout; //叫号服务完成的超时时间(默认情况下下一次叫号代表上一次完成,默认值5分钟)
+    static int cs_detecthandsup_time; //叫号后持续检测举手的时间(默认10s)
+    static int cs_detecthandsup_interval ; //叫号后持续检测举手的时间间隔(默认1秒1次)
+    static int cs_detectsmile_interval; //叫号后微笑检测的时间间隔(默认1秒1次)
+    static int cs_detectspeech_interval;//叫号后语音检测的时间间隔(默认20秒)
+    static int cs_detectpose_interval;  //叫号后姿态检测的时间间隔(默认5秒1次)
+    static int detectpose_interval;     //非叫号期间姿态检测的时间间隔(默认5秒1次)
+    static int detectsmile_interval;    //非叫号期间微笑检测的时间间隔(默认1秒1次)
+    static int detectappearance_interval; //着装检测间隔
+    static float action_turnpen_thrd;   //转笔阈值
+    static float action_turnchair_thrd; //转椅阈值
+    static float action_record_time;    //动作录制时长
+    static float sit_supporthead_thrd;  //撑头阈值
+    static float sit_layondesk_thrd;    //趴桌阈值
+    static float sit_relyingonchair_thrd;//靠椅阈值
+    static std::string log_path;
+    static std::string log_level;
+    static std::string temp_path;
+	static bool set(const char* key, const char* val);
+	static bool haskey(const char* key);
+	static const char* getvalue(const char* key);
+};
+/**
+ */
+enum VideoScene
+{
+    SCENE_counter,    // 柜台
+    SCENE_financial,   // 理财
+    SCENE_lobby,       // 大堂
+    SCENE_hall             // 门厅
+};
+/**
+ */
+class VideoPar
+{
+private:
+    std::vector<shared_ptr<parambase>> params;
+public:
+    VideoScene scene;            //场景: 1柜台, 2理财, 3大堂, 4进门(着装检测)
+    bool audio_enable ;          //音频开关 1开,0关
+    int audio_channels ;         //音频通道数 0,1,2,4,6
+    int audio_sample_rate ;      // 采样率 44100, 48000, 96000, 192000
+    bool video_enable ;          // 视频开关 1开,0关
+    //int video_analyse_rate ;   //视频分析速率: 数字>0,每秒分析帧数
+    bool video_sample_keyframe;  //只解码关键帧
+    bool video_record;           //启用录制视频 1开,0关
+    int video_record_duration;   //视频录制时长,默认10s
+    int video_record_reviewtime; //视频录制回溯时长,默认5s
+    int face_minsize;            //最小人脸大小
+    VideoPar();
+    //~VideoPar();
+    bool set(const char* key, const char* val);
+    static bool haskey(const char* key);
+};
+
+template<class T>
+inline int64_t NowTime()
+{
+	return std::chrono::time_point_cast<T>(std::chrono::system_clock::now()).time_since_epoch().count();
+}
+
+/**--------------------------------- 以下是models各个模型所用到的方法 ---------------------------------**/
+
+inline bool detectFileExist(char *file_path) {
+    std::ifstream _ifstream;
+    _ifstream.open(file_path, std::ios::in);
+    if (!_ifstream) {
+        return false;
+    }
+    _ifstream.close();
+    return true;
+}
+
+// 矩阵变换，对向量xy进行旋转
+inline cv::Mat_<double> rotate_point(cv::Mat_<double> xy, double angle) {
+    cv::Mat rotate_matrix = (cv::Mat_<double>(2, 2) << cos(angle), -sin(angle), sin(angle), cos(angle));
+    cv::transpose(rotate_matrix, rotate_matrix);
+    auto rotate_xy = xy * rotate_matrix;
+    return rotate_xy;
+}
+
+// 检查点是否在框内
+inline bool check_point_in_rect(cv::Point point, cv::Rect rect) {
+    if ((rect.x < point.x && point.x < rect.x + rect.width) &&
+        (rect.y < point.y && point.y < rect.y + rect.height)) {
+        return true;//在rect内部
+    } else {
+        return false;//在rect边上或外部
+    }
+}
+
--- a/duix-sdk/src/main/cpp/dhmfcc/wenetai.cpp
+++ b/duix-sdk/src/main/cpp/dhmfcc/wenetai.cpp
@ -0,0 +1,119 @@
+#include "wenetai.h"
+WeAI::WeAI(int melcnt,int bnfcnt,int trd){
+  n_trd = trd;
+  dimin = melcnt;
+  dimout = bnfcnt;
+  sizein = melcnt*80*sizeof(float);
+  sizeout = bnfcnt*256*sizeof(float);
+  shapein[1] = melcnt;
+  shapeout[1] = bnfcnt;
+  buflen[0] = melcnt;
+
+  bufin = (float*)malloc(sizein+1024);
+  bufout = (float*)malloc(sizeout+1024);
+}
+
+WeAI::~WeAI(){
+  free(bufin);
+  free(bufout);
+}
+
+int WeAI::dorun(float* mel,int melcnt,float* bnf,int bnfcnt){
+  return 0;
+}
+
+
+int WeAI::run(float* mel,int melcnt,float* bnf,int bnfcnt){
+  dimin = melcnt;
+  dimout = bnfcnt;
+  sizein = melcnt*80*sizeof(float);
+  sizeout = bnfcnt*256*sizeof(float);
+  shapein[1] = melcnt;
+  shapeout[1] = bnfcnt;
+  buflen[0] = melcnt;
+  return dorun(mel,melcnt,bnf,bnfcnt);
+}
+
+int WeAI::test(){
+  return dorun(bufin,dimin,bufout,dimout);
+}
+
+int WeOnnx::dorun(float* mel,int melcnt,float* bnf,int bnfcnt){
+  //
+  Ort::Value arrin[2] = {Ort::Value::CreateTensor( memoryInfo, mel ,sizein ,  shapein, 3 ,ONNX_TENSOR_ELEMENT_DATA_TYPE_FLOAT),Ort::Value::CreateTensor( memoryInfo, buflen ,sizelen ,  shapelen, 1 ,ONNX_TENSOR_ELEMENT_DATA_TYPE_INT32)};
+  Ort::Value arrout[1] = {Ort::Value::CreateTensor( memoryInfo, bnf ,sizeout ,  shapeout, 3 ,ONNX_TENSOR_ELEMENT_DATA_TYPE_FLOAT)};
+  session.Run(runOptions, names_in, arrin, 2, names_out,arrout, 1);
+  return 0;
+}
+
+WeOnnx::WeOnnx(std::string modelfn,int mel,int bnf,int trd):WeAI(mel,bnf,trd){
+  //
+  env = Ort::Env(OrtLoggingLevel::ORT_LOGGING_LEVEL_WARNING, "wenet");
+  sessionOptions = Ort::SessionOptions();
+//  sessionOptions.SetIntraOpNumThreads(n_trd);
+    sessionOptions.SetIntraOpNumThreads(2);
+// todo jth add
+  //sessionOptions.SetIntraOpNumThreads(1);
+  //sessionOptions.SetInterOpNumThreads(1);
+  sessionOptions.AddConfigEntry("session.disable_prepacking", "1");
+  sessionOptions.SetGraphOptimizationLevel( GraphOptimizationLevel::ORT_ENABLE_ALL);
+  session = Ort::Session(env, modelfn.c_str(), sessionOptions);
+  memoryInfo = Ort::MemoryInfo::CreateCpu( OrtAllocatorType::OrtDeviceAllocator, OrtMemType::OrtMemTypeCPU);
+  //Ort::MemoryInfo::CreateCpu( OrtAllocatorType::OrtArenaAllocator, OrtMemType::OrtMemTypeDefault);
+  //tensorin = Ort::Value::CreateTensor( memoryInfo, bufin ,sizein ,  shapein, 3 ,ONNX_TENSOR_ELEMENT_DATA_TYPE_FLOAT);
+  //tensorlen = Ort::Value::CreateTensor( memoryInfo, buflen ,sizelen ,  shapelen, 1 ,ONNX_TENSOR_ELEMENT_DATA_TYPE_INT32);
+  //tensorout = Ort::Value::CreateTensor( memoryInfo, bufout ,sizeout ,  shapeout, 3 ,ONNX_TENSOR_ELEMENT_DATA_TYPE_FLOAT);
+}
+
+WeOnnx::~WeOnnx(){
+}
+
+
+#ifdef WENETOPENV
+int WeOpvn::dorun(float* mel,int melcnt,float* bnf,int bnfcnt){
+	printf("====opvn run %d \n",sizeout);
+	std::cout<<ainput_shape<<std::endl;
+	std::cout<<aoutput_shape<<std::endl;
+  ov::Tensor ainput_tensor = ov::Tensor(ainput_type, ainput_shape, mel);
+  ov::Tensor binput_tensor = ov::Tensor(binput_type, binput_shape, binput_data);
+  ov::Tensor aoutput_tensor = ov::Tensor(aoutput_type, aoutput_shape, bnf);
+  infer_request.set_input_tensor(0,ainput_tensor);
+  infer_request.set_input_tensor(1,binput_tensor);
+  infer_request.set_output_tensor(0,aoutput_tensor);
+  infer_request.infer();
+  //const ov::Tensor& output_tensor = infer_request.get_output_tensor();
+  //const float* data = (float*)output_tensor.data();//<const float>();
+  //memcpy(bnf,data,sizeout);
+  return 0;
+}
+
+WeOpvn::WeOpvn(std::string modelfn,std::string xmlfn,int mel,int bnf,int trd):WeAI(mel,bnf,trd){
+  std::shared_ptr<ov::Model>  model = core.read_model(xmlfn,modelfn);
+  ov::preprocess::PrePostProcessor ppp(model);
+
+  ov::preprocess::InputInfo& ainfo = ppp.input(aname);  
+  ov::preprocess::InputInfo&  binfo = ppp.input(bname);  
+  ainput_shape[1] = mel;
+  aoutput_shape[1] = bnf;
+  binput_data[0] = mel;
+  ainfo.tensor().set_element_type(ainput_type).set_shape(ainput_shape);
+  binfo.tensor().set_element_type(binput_type).set_shape(binput_shape);
+  ainfo.preprocess();                                                                             //
+  binfo.preprocess();                                                                             //
+  ov::preprocess::OutputInfo&  aout = ppp.output(cname);  
+  aout.tensor().set_element_type(aoutput_type);
+
+  model = ppp.build();
+  std::string device_name = "CPU";
+  ov::CompiledModel  compiled_model = core.compile_model(model, device_name,
+      ov::inference_num_threads(int(n_trd)) );
+
+  infer_request = compiled_model.create_infer_request();
+  //
+  //model = nullptr;
+}
+
+WeOpvn::~WeOpvn(){
+
+}
+#endif
--- a/duix-sdk/src/main/cpp/dhmfcc/wenetai.h
+++ b/duix-sdk/src/main/cpp/dhmfcc/wenetai.h
@ -0,0 +1,92 @@
+#pragma once
+#include <stdio.h>
+#include <string>
+#include <vector>
+#include <stdlib.h>
+
+
+class WeAI{
+  protected:
+    int n_trd = 4;
+    int dimin = 321;
+    int dimout = 78;
+    int dimlen = 1;
+    int64_t sizein = 321*80*sizeof(float);
+    int64_t sizeout = 78*256*sizeof(float);
+    int64_t sizelen = sizeof(int32_t);
+    float* bufin = NULL;
+    float* bufout = NULL;
+    int32_t buflen[1]; 
+    int64_t shapein[3]={1,321,80};
+    int64_t shapelen[1]={1};
+    int64_t shapeout[3]={1,78,256};
+    const char* names_in[2]={"speech","speech_lengths"};
+    const char* names_out[1]={"encoder_out"};
+
+    virtual int dorun(float* mel,int melcnt,float* bnf,int bnfcnt);
+  public:
+    WeAI(int melcnt,int bnfcnt,int trd=4);
+    int run(float* mel,int melcnt,float* bnf,int bnfcnt);
+    int test();
+    virtual ~WeAI();
+};
+
+
+#define WENETONNX  1
+#ifdef WENETONNX
+#include "onnxruntime_cxx_api.h"
+class WeOnnx:public WeAI{
+  protected:
+
+    //Ort::Value tensorin{nullptr};
+    //Ort::Value tensorlen{nullptr};
+    //Ort::Value tensorout{nullptr};
+
+    Ort::Env env{nullptr};
+    Ort::SessionOptions sessionOptions{nullptr};
+    Ort::RunOptions runOptions;
+    Ort::Session session{nullptr};
+    Ort::MemoryInfo memoryInfo{nullptr};
+  protected:
+    virtual int dorun(float* mel,int melcnt,float* bnf,int bnfcnt);
+  public:
+    WeOnnx(std::string modelfn,int mel,int bnf,int trd);
+    virtual ~WeOnnx();
+};
+#endif
+
+#ifdef WENETMNN
+class WeMnn:public WeAI{
+};
+#endif
+
+
+//#define WENETOPENV 
+#ifdef WENETOPENV
+#include "openvino/openvino.hpp"
+class WeOpvn:public WeAI{
+  private:
+    ov::element::Type ainput_type = ov::element::f32;
+    ov::element::Type binput_type = ov::element::i32;
+    ov::element::Type aoutput_type = ov::element::f32;
+
+    ov::Shape ainput_shape = {1, 321,80};
+    ov::Shape binput_shape = {1};
+    ov::Shape aoutput_shape = {1, 79,256};
+
+
+    int32_t  binput_data[1];
+
+    ov::Core core;
+    ov::InferRequest infer_request ;
+    std::string aname = "speech";
+    std::string bname = "speech_lengths";
+    std::string cname = "encoder_out";
+  protected:
+    virtual int dorun(float* mel,int melcnt,float* bnf,int bnfcnt);
+  public:
+    WeOpvn(std::string modelfn,std::string xmlfn,int mel,int bnf,int trd);
+    virtual ~WeOpvn();
+};
+#endif
+
--- a/duix-sdk/src/main/cpp/dhmfcc/wenetov.cpp
+++ b/duix-sdk/src/main/cpp/dhmfcc/wenetov.cpp
@ -0,0 +1,134 @@
+// Copyright (C) 2018-2025 Intel Corporation
+// SPDX-License-Identifier: Apache-2.0
+//
+
+#include <iterator>
+#include <memory>
+#include <sstream>
+#include <string>
+#include <vector>
+#include <sys/timeb.h>
+#include <unistd.h>
+#include <time.h>
+
+// clang-format off
+#include "openvino/openvino.hpp"
+#include "openvino/core/preprocess/input_info.hpp"
+
+uint64_t jtimer_msstamp(){
+  struct timespec ts;
+  clock_gettime(CLOCK_MONOTONIC, &ts);
+  return (ts.tv_sec*1000l) + (ts.tv_nsec/CLOCKS_PER_SEC);
+}
+
+// clang-format on
+
+/**
+ * @brief Main with support Unicode paths, wide strings
+ */
+int main(int argc, char* argv[]) {
+
+        const std::string amodel_path = "wenet.xml";
+        const std::string bmodel_path = "wenet.bin";
+
+        // -------- Step 1. Initialize OpenVINO Runtime Core --------
+        ov::Core core;
+
+        // -------- Step 2. Read a model --------
+        printf("===aaa\n");
+        std::shared_ptr<ov::Model> model = core.read_model(amodel_path,bmodel_path);
+        printf("===bbb\n");
+        //printInputAndOutputsInfo(*model);
+
+        OPENVINO_ASSERT(model->inputs().size() == 2, "Sample supports models with 1 input only");
+        OPENVINO_ASSERT(model->outputs().size() == 1, "Sample supports models with 1 output only");
+
+        // -------- Step 3. Set up input
+
+        // Read input image to a tensor and set it to an infer request
+        // without resize and layout conversions
+
+        ov::element::Type ainput_type = ov::element::f32;
+        ov::Shape ainput_shape = {1, 321,80};
+        float*  ainput_data = (float*)malloc(sizeof(float)*321*80);
+        memset(ainput_data,0,sizeof(float)*321*80);
+        ov::element::Type binput_type = ov::element::i32;
+        ov::Shape binput_shape = {1};
+        int32_t*  binput_data = (int32_t*)malloc(10);
+        *binput_data = 321;
+
+        // just wrap image data by ov::Tensor without allocating of new memory
+        ov::Tensor ainput_tensor = ov::Tensor(ainput_type, ainput_shape, ainput_data);
+        ov::Tensor binput_tensor = ov::Tensor(binput_type, binput_shape, binput_data);
+
+        //const ov::Layout tensor_layout{"NHWC"};
+
+        // -------- Step 4. Configure preprocessing --------
+
+        ov::preprocess::PrePostProcessor ppp(model);
+
+        // 1) Set input tensor information:
+        // - input() provides information about a single model input
+        // - reuse precision and shape from already available `input_tensor`
+        // - layout of data is 'NHWC'
+        std::string aname = "speech";
+        ov::preprocess::InputInfo& ainfo = ppp.input(aname);  
+        ainfo.tensor().set_shape(ainput_shape).set_element_type(ainput_type);//set_layout(tensor_layout);
+        std::string bname = "speech_lengths";
+        ov::preprocess::InputInfo& binfo = ppp.input(bname);  
+        binfo.tensor().set_shape(binput_shape).set_element_type(binput_type);//set_layout(tensor_layout);
+        ainfo.preprocess();                                                                             //
+        binfo.preprocess();                                                                             //
+        ppp.output().tensor().set_element_type(ov::element::f32);
+                                                                                                                                    //
+                                                                                                        //
+                                                                                                        //
+                                                                             //
+        // 2) Adding explicit preprocessing steps:
+        // - convert layout to 'NCHW' (from 'NHWC' specified above at tensor layout)
+        // - apply linear resize from tensor spatial dims to model spatial dims
+        //ppp.input().preprocess().resize(ov::preprocess::ResizeAlgorithm::RESIZE_LINEAR);
+        // 4) Suppose model has 'NCHW' layout for input
+        //ppp.input().model().set_layout("NCHW");
+        // 5) Set output tensor information:
+        // - precision of tensor is supposed to be 'f32'
+
+        // 6) Apply preprocessing modifying the original 'model'
+        model = ppp.build();
+
+        std::string device_name = "CPU";
+        // -------- Step 5. Loading a model to the device --------
+        ov::CompiledModel compiled_model = core.compile_model(model, device_name,
+          ov::inference_num_threads(int(4))
+        );
+
+        // -------- Step 6. Create an infer request --------
+        ov::InferRequest infer_request = compiled_model.create_infer_request();
+        // -----------------------------------------------------------------------------------------------------
+
+        // -------- Step 7. Prepare input --------
+        infer_request.set_input_tensor(0,ainput_tensor);
+        infer_request.set_input_tensor(1,binput_tensor);
+
+        // -------- Step 8. Do inference synchronously --------
+        for(int k=0;k<10000;k++){
+    uint64_t tick = jtimer_msstamp();
+        infer_request.infer();
+    int dist = jtimer_msstamp()-tick;
+    printf("===dist %d\n",dist);
+    usleep(1000);
+        }
+
+        // -------- Step 9. Process output
+        const ov::Tensor& output_tensor = infer_request.get_output_tensor();
+        const float* data = output_tensor.data<const float>();
+        for(int k=0;k<10;k++){
+          printf("===%f \n",data[k]);
+        }
+        //
+
+        // Print classification results
+        // -----------------------------------------------------------------------------------------------------
+
+    return EXIT_SUCCESS;
+}
--- a/duix-sdk/src/main/cpp/dhunet/blendgram.cpp
+++ b/duix-sdk/src/main/cpp/dhunet/blendgram.cpp
@ -0,0 +1,437 @@
+#include <stdio.h>
+#include <math.h>
+#include <stdlib.h>
+
+#include "blendgram.h"
+
+
+  void  exColorBlend_Normal(uint8* T,uint8* A,uint8* B){ ColorBlend_Buffer(T,A,B,Normal); }
+  void  exColorBlend_Lighten(uint8* T,uint8* A,uint8* B)       { ColorBlend_Buffer(T,A,B,Lighten);}
+  void  exColorBlend_Darken(uint8* T,uint8* A,uint8* B)        { ColorBlend_Buffer(T,A,B,Darken);}
+  void  exColorBlend_Multiply(uint8* T,uint8* A,uint8* B)      { ColorBlend_Buffer(T,A,B,Multiply);}
+  void  exColorBlend_Average(uint8* T,uint8* A,uint8* B)       { ColorBlend_Buffer(T,A,B,Average);}
+  void  exColorBlend_Add(uint8* T,uint8* A,uint8* B)           { ColorBlend_Buffer(T,A,B,Add);}
+
+  void  exColorBlend_Subtract(uint8* T,uint8* A,uint8* B)      { ColorBlend_Buffer(T,A,B,Subtract);}
+  void  exColorBlend_Difference(uint8* T,uint8* A,uint8* B)    { ColorBlend_Buffer(T,A,B,Difference);}
+  void  exColorBlend_Negation(uint8* T,uint8* A,uint8* B)      { ColorBlend_Buffer(T,A,B,Negation);}
+  void  exColorBlend_Screen(uint8* T,uint8* A,uint8* B)        { ColorBlend_Buffer(T,A,B,Screen);}
+  void  exColorBlend_Exclusion(uint8* T,uint8* A,uint8* B)     { ColorBlend_Buffer(T,A,B,Exclusion);}
+
+  void  exColorBlend_Overlay(uint8* T,uint8* A,uint8* B)       { ColorBlend_Buffer(T,A,B,Overlay);}
+  void  exColorBlend_SoftLight(uint8* T,uint8* A,uint8* B)     { ColorBlend_Buffer(T,A,B,SoftLight);}
+  void  exColorBlend_HardLight(uint8* T,uint8* A,uint8* B)     { ColorBlend_Buffer(T,A,B,HardLight);}
+  void  exColorBlend_ColorDodge(uint8* T,uint8* A,uint8* B)    { ColorBlend_Buffer(T,A,B,ColorDodge);}
+  void  exColorBlend_ColorBurn(uint8* T,uint8* A,uint8* B)     { ColorBlend_Buffer(T,A,B,ColorBurn);}
+
+  void  exColorBlend_LinearDodge(uint8* T,uint8* A,uint8* B)   { ColorBlend_Buffer(T,A,B,LinearDodge);}
+  void  exColorBlend_LinearBurn(uint8* T,uint8* A,uint8* B)    { ColorBlend_Buffer(T,A,B,LinearBurn);}
+  void  exColorBlend_LinearLight(uint8* T,uint8* A,uint8* B)   { ColorBlend_Buffer(T,A,B,LinearLight);}
+  void  exColorBlend_VividLight(uint8* T,uint8* A,uint8* B)    { ColorBlend_Buffer(T,A,B,VividLight);}
+  void  exColorBlend_PinLight(uint8* T,uint8* A,uint8* B)      { ColorBlend_Buffer(T,A,B,PinLight);}
+
+  void  exColorBlend_HardMix(uint8* T,uint8* A,uint8* B)       { ColorBlend_Buffer(T,A,B,HardMix);}
+  void  exColorBlend_Reflect(uint8* T,uint8* A,uint8* B)       { ColorBlend_Buffer(T,A,B,Reflect);}
+  void  exColorBlend_Glow(uint8* T,uint8* A,uint8* B)          { ColorBlend_Buffer(T,A,B,Glow);}
+  void  exColorBlend_Phoenix(uint8* T,uint8* A,uint8* B)       { ColorBlend_Buffer(T,A,B,Phoenix);}
+
+typedef void (*BlendFunc) (uint8* T,uint8* A,uint8* B);
+static int MAX_FUNC = 25;
+static BlendFunc blendfuncs[25]={
+  &exColorBlend_Normal,
+  &exColorBlend_Lighten,
+  &exColorBlend_Darken,
+  &exColorBlend_Multiply,
+  &exColorBlend_Average,
+  &exColorBlend_Add,
+
+  &exColorBlend_Subtract,
+  &exColorBlend_Difference,
+  &exColorBlend_Negation,
+  &exColorBlend_Screen,
+  &exColorBlend_Exclusion,
+
+  &exColorBlend_Overlay,
+  &exColorBlend_SoftLight,
+  &exColorBlend_HardLight,
+  &exColorBlend_ColorDodge,
+  &exColorBlend_ColorBurn,
+
+  &exColorBlend_LinearDodge,
+  &exColorBlend_LinearBurn,
+  &exColorBlend_LinearLight,
+  &exColorBlend_VividLight,
+  &exColorBlend_PinLight,
+
+  &exColorBlend_HardMix,
+  &exColorBlend_Reflect,
+  &exColorBlend_Glow,
+  &exColorBlend_Phoenix
+};
+
+void BlendGramSimp(unsigned char *Src,unsigned char* Mask, unsigned char *Dest, int Width, int Height, int Mode)
+{
+	if(Mode<1)return;
+	if(Mode>=MAX_FUNC)return;
+	BlendFunc func=blendfuncs[Mode];
+	unsigned char *LinePS, *LinePD,*LinePM;
+	for (int Y = 0; Y < Height; Y += 1)
+	{
+		LinePS = Src + Y * Width * 4;
+		LinePM = Mask + Y * Width * 4;
+		LinePD = Dest + Y * Width * 4;
+		for (int X = 0; X < Width; X += 1)
+		{
+			func(LinePD,LinePS,LinePM);
+			LinePS += 4;
+			LinePM += 4;
+			LinePD += 4;
+		}
+	}
+}
+
+void BlendGramAlpha3(unsigned char *Src,unsigned char* Mask, unsigned char *Dest, int Width, int Height)
+{
+    //printf("w %d h %d\n",Width,Height);
+	unsigned char *LinePS, *LinePD,*LinePM;
+	for (int Y = 0; Y < Height; Y += 1)
+	{
+		LinePS = Src + Y * Width * 3;
+		LinePM = Mask + Y * Width * 3;
+		LinePD = Dest + Y * Width * 3;
+		for (int X = 0; X < Width; X += 1)
+		{
+			//func(LinePD,LinePS,LinePM);
+            //ColorBlend_Alpha(LinePD,LinePD,LinePS,*LinePM);
+            float alpha = *LinePM/255.0f;
+            float beta = 1.0f-alpha;
+            //if(beta<0.5f) printf("==alpha %f beta %f\n",alpha,beta);
+            //if(beta<0.5f) printf("od %u ps %u\n",LinePD[0],LinePS[0]);
+            LinePD[0] =  CLAMPCOLOR(     LinePD[0]*alpha+LinePS[0]*beta);
+            //if(beta<0.5f) printf("new %u ps%u \n",LinePD[0],LinePS[0]);
+            //if(beta<0.5f) getchar();
+            LinePD[1] = CLAMPCOLOR(LinePD[1]*alpha+LinePS[1]*beta);
+            LinePD[2] = CLAMPCOLOR( LinePD[2]*alpha+LinePS[2]*beta);
+			LinePS += 3;
+			LinePM += 3;
+			LinePD += 3;
+		}
+	}
+}
+
+void BlendGramAlpha(unsigned char *Src,unsigned char* Mask, unsigned char *Dest, int Width, int Height)
+{
+	unsigned char *LinePS, *LinePD,*LinePM;
+	for (int Y = 0; Y < Height; Y += 1)
+	{
+		LinePS = Src + Y * Width * 3;
+		LinePM = Mask + Y * Width * 1;
+		LinePD = Dest + Y * Width * 3;
+		for (int X = 0; X < Width; X += 1)
+		{
+			//func(LinePD,LinePS,LinePM);
+            ColorBlend_Alpha(LinePD,LinePD,LinePS,*LinePM);
+            /*
+            float alpha = *LinePM/255.0f;
+            float beta = 1.0f-alpha;
+            //printf("==alpha %f beta %f\n",alpha,beta);
+            LinePD[0] = LinePD[0]*alpha+LinePS[0]*beta;
+            LinePD[1] = LinePD[1]*alpha+LinePS[1]*beta;
+            LinePD[2] = LinePD[2]*alpha+LinePS[2]*beta;
+            */
+			LinePS += 3;
+			LinePM += 1;
+			LinePD += 3;
+		}
+	}
+}
+
+void BlendGramAlphaRev(unsigned char *Src,unsigned char* Mask, unsigned char *Dest, int Width, int Height)
+{
+	unsigned char *LinePS, *LinePD,*LinePM;
+	for (int Y = 0; Y < Height; Y += 1)
+	{
+		LinePS = Src + Y * Width * 3;
+		LinePM = Mask + Y * Width * 1;
+		LinePD = Dest + Y * Width * 3;
+		for (int X = 0; X < Width; X += 1)
+		{
+			//func(LinePD,LinePS,LinePM);
+            ColorBlend_Alpha(LinePD,LinePS,LinePD,*LinePM);
+			LinePS += 3;
+			LinePM += 1;
+			LinePD += 3;
+		}
+	}
+}
+
+
+
+
+/*
+void BlendGram(CBitmap* image,CBitmap* mask,int mode)
+{
+	if(mode<1)return;
+	if(mode>=MAX_FUNC)return;
+	BlendFunc func=blendfuncs[mode];
+	int Stride=image->width*4;
+		unsigned char *LinePS, *LinePD,*LinePM;
+	for (int Y = 0; Y < image->height; Y += 1)
+	{
+		LinePS = (unsigned char*)image->pixels +image->stride*Y;
+		LinePM = (unsigned char*)mask->pixels + mask->stride*Y;
+		LinePD = (unsigned char*)image->pixels +image->stride*Y;
+		for (int X = 0; X < image->width; X += 1)
+		{
+			func(LinePD,LinePS,LinePM);
+			LinePS += 4;
+			LinePM += 4;
+			LinePD += 4;
+		}
+	}
+}
+
+void BlendImageAdjustWithMask(CBitmap* bmp,CBitmap* adj,CBitmap* dst ,CBitmap* msk,int mode)
+{
+	unsigned char* bmppixels=(unsigned char*)bmp->pixels;
+	unsigned char* mskpixels=(unsigned char*)msk->pixels;
+	unsigned char* dstpixels=(unsigned char*)dst->pixels;
+	unsigned char* adjpixels=(unsigned char*)adj->pixels;
+	int stride=bmp->stride;
+	int width=bmp->width;
+	int height=bmp->height;
+	int X,Y;
+	unsigned char* LinePS , * LinePM , * LinePD , * LinePA ;
+	#pragma omp parallel for private(LinePS,LinePM,LinePD,LinePA,X,Y)
+	for (Y = 0; Y < height; Y ++)
+	{
+		int offset=stride*Y;
+		LinePS = bmppixels +offset;
+		LinePM = mskpixels +offset;
+		LinePD = dstpixels +offset;
+		LinePA = adjpixels +offset;
+		for (X = 0; X < width; X ++)
+		{
+			unsigned char M=*LinePM;
+			if(M==0xFF){
+				LinePD[0]=LinePS[0];
+				LinePD[1]=LinePS[1];
+				LinePD[2]=LinePS[2];
+			}else if(M==0x00){
+				LinePD[0]=LinePA[0];
+				LinePD[1]=LinePA[1];
+				LinePD[2]=LinePA[2];
+			}else{
+				ColorBlend_Alpha(LinePD,LinePS,LinePA,M);
+			}
+			LinePD[3]=LinePS[3];
+			LinePS += 4; LinePM += 4; LinePD += 4; LinePA += 4;
+		}
+	}
+}
+
+
+void BlendImageAdjustWithMaskEx(CBitmap* bmp,CBitmap* adj,CBitmap* dst ,CBitmap* msk,int mode)
+{
+	unsigned char* bmppixels=(unsigned char*)bmp->pixels;
+	unsigned char* mskpixels=(unsigned char*)msk->pixels;
+	unsigned char* dstpixels=(unsigned char*)dst->pixels;
+	unsigned char* adjpixels=(unsigned char*)adj->pixels;
+	int stride=bmp->stride;
+	int width=bmp->width;
+	int height=bmp->height;
+	int X,Y;
+	unsigned char* LinePS , * LinePM , * LinePD , * LinePA ;
+	#pragma omp parallel for private(LinePS,LinePM,LinePD,LinePA,X,Y)
+	for (Y = 0; Y < height; Y ++)
+	{
+		int offset=stride*Y;
+		LinePS = bmppixels +offset;
+		LinePM = mskpixels +offset;
+		LinePD = dstpixels +offset;
+		LinePA = adjpixels +offset;
+		for (X = 0; X < width; X ++)
+		{
+			unsigned char M=*LinePM;
+			if(M==0xFF){
+				LinePD[0]=LinePS[0];
+				LinePD[1]=LinePS[1];
+				LinePD[2]=LinePS[2];
+			}else if(M==0x00){
+				LinePD[0]=LinePA[0];
+				LinePD[1]=LinePA[1];
+				LinePD[2]=LinePA[2];
+			}else{
+				//ColorBlend_Alpha(LinePD,LinePS,LinePA,M);
+				LinePD[0]=LinePS[0]*M>>8;
+				LinePD[1]=LinePS[1]*M>>8;
+				LinePD[2]=LinePS[2]*M>>8;
+			}
+			LinePD[3]=M;
+			LinePS += 4; LinePM += 4; LinePD += 4; LinePA += 4;
+		}
+	}
+}
+
+
+
+
+void BlendImageAdjustWithAlpha(CBitmap* bmp,CBitmap* adj,CBitmap* dst ,int alpha,int mode){
+	unsigned char* bmppixels=(unsigned char*)bmp->pixels;
+	unsigned char* dstpixels=(unsigned char*)dst->pixels;
+	unsigned char* adjpixels=(unsigned char*)adj->pixels;
+	int stride=bmp->stride;
+	int width=bmp->width;
+	int height=bmp->height;
+	int X,Y;
+	unsigned char M=CLAMPCOLOR(alpha);
+	unsigned char *LinePS ,  *LinePD , *LinePA ;
+	#pragma omp parallel for private(LinePS,LinePD,LinePA,X,Y)
+	for (Y = 0; Y < height; Y ++)
+	{
+		int offset=stride*Y;
+		LinePS = bmppixels +offset;
+		LinePD = dstpixels +offset;
+		LinePA = adjpixels +offset;
+		for (X = 0; X < width; X ++)
+		{
+			if(M==0xFF){
+				LinePD[0]=LinePS[0];
+				LinePD[1]=LinePS[1];
+				LinePD[2]=LinePS[2];
+			}else if(M==0x00){
+				LinePD[0]=LinePA[0];
+				LinePD[1]=LinePA[1];
+				LinePD[2]=LinePA[2];
+			}else{
+				ColorBlend_Alpha(LinePD,LinePS,LinePA,M);
+			}
+			LinePD[3]=LinePS[3];
+			LinePS += 4;  LinePD += 4; LinePA += 4;
+		}
+	}
+}
+
+void BlendImageAdjustWithAlphaMask(CBitmap* bmp,CBitmap* adj,CBitmap* dst ,CBitmap* msk,int alpha,int mode){
+	unsigned char* bmppixels=(unsigned char*)bmp->pixels;
+	unsigned char* mskpixels=(unsigned char*)msk->pixels;
+	unsigned char* dstpixels=(unsigned char*)dst->pixels;
+	unsigned char* adjpixels=(unsigned char*)adj->pixels;
+	int stride=bmp->stride;
+	int width=bmp->width;
+	int height=bmp->height;
+	int X,Y;
+	unsigned char NM=CLAMPCOLOR(alpha);
+	unsigned char *LinePS , *LinePM , *LinePD , *LinePA ;
+	#pragma omp parallel for private(LinePS,LinePM,LinePD,LinePA,X,Y)
+	for (Y = 0; Y < height; Y ++)
+	{
+		int offset=stride*Y;
+		LinePS = bmppixels +offset;
+		LinePM = mskpixels +offset;
+		LinePD = dstpixels +offset;
+		LinePA = adjpixels +offset;
+		for (X = 0; X < width; X ++)
+		{
+			unsigned char M=*LinePM;
+			if(M==0xFF){
+				LinePD[0]=LinePS[0];
+				LinePD[1]=LinePS[1];
+				LinePD[2]=LinePS[2];
+			}else if(M==0x00){
+				if(NM==0xFF){
+					LinePD[0]=LinePS[0];
+					LinePD[1]=LinePS[1];
+					LinePD[2]=LinePS[2];
+				}else {
+					if(NM==0x00){
+					//none
+						LinePD[0]=LinePA[0];
+						LinePD[1]=LinePA[1];
+						LinePD[2]=LinePA[2];
+					}else{
+						ColorBlend_Alpha(LinePD,LinePS,LinePA,NM);
+					}
+				}
+			}else{
+				//
+				if(NM==0xFF){
+					LinePD[0]=LinePS[0];
+					LinePD[1]=LinePS[1];
+					LinePD[2]=LinePS[2];
+				}else{
+					if(NM==0x00){
+						ColorBlend_Alpha(LinePD,LinePS,LinePA,M);
+					}else{
+						ColorBlend_Alpha(LinePA,LinePS,LinePA,NM);
+						ColorBlend_Alpha(LinePD,LinePS,LinePA,M);
+					}
+				}
+			}
+			LinePD[3]=LinePS[3];
+			LinePS += 4; LinePM += 4; LinePD += 4; LinePA += 4;
+		}
+	}
+}
+
+void ReadAlphaBySrc(CBitmap* src,CBitmap* alpha){
+	memcpy(alpha,src,sizeof(CBitmap));
+	alpha->stride=src->width;
+	alpha->channel=1;
+	alpha->pixels=(CPixel*)malloc(alpha->width*alpha->height*sizeof(unsigned char));
+	unsigned char* bmppixels=(unsigned char*)src->pixels;
+	unsigned char* alapixels=(unsigned char*)alpha->pixels;
+	int stride=src->stride;
+	int width=src->width;
+	int height=src->height;
+	int X,Y;
+	unsigned char *LinePS ,  *LinePA;
+	//#pragma omp parallel for private(LinePS,LinePA)
+	for (Y = 0; Y < height; Y ++)
+	{
+		LinePS = bmppixels +stride*Y;
+		LinePA = alapixels +width*Y;
+		for (X = 0; X < width; X ++)
+		{
+			LinePA[0]=LinePS[3];
+			LinePS += 4;  LinePA ++;
+		}
+	}
+}
+
+
+void CheckAlpha(CBitmap* bmp,CBitmap* alpha)
+{
+	unsigned char* bmppixels=(unsigned char*)bmp->pixels;
+	unsigned char* alapixels=(unsigned char*)alpha->pixels;
+	int stride=bmp->stride;
+	int width=bmp->width;
+	int height=bmp->height;
+	int X,Y;
+	unsigned char *LinePS ,  *LinePA;
+	//#pragma omp parallel for private(LinePS,LinePA)
+	for (Y = 0; Y < height; Y ++)
+	{
+		LinePS = bmppixels +stride*Y;
+		LinePA = alapixels +width*Y;
+		for (X = 0; X < width; X ++)
+		{
+			//unsigned char M=LinePA[0];
+			if(*LinePA==0x00){
+				LinePS[0]=0;
+				LinePS[1]=0;
+				LinePS[2]=0;
+				LinePS[3]=0;
+			//}else if(M<0xff){
+				//if(LinePD[0]>M)LinePD[0]=M;
+				//if(LinePD[1]>M)LinePD[1]=M;
+				//if(LinePD[2]>M)LinePD[2]=M;
+				//LinePD[3]=M;
+			}else{
+			}
+			LinePS += 4;  LinePA++;
+		}
+	}
+}
+*/
+
--- a/duix-sdk/src/main/cpp/dhunet/blendgram.h
+++ b/duix-sdk/src/main/cpp/dhunet/blendgram.h
@ -0,0 +1,287 @@
+#ifndef __BLENDGRAM_H__
+#define __BLENDGRAM_H__
+
+#include <stdlib.h>
+#include <stdio.h>
+#include <math.h>
+
+typedef unsigned char uchar;
+#define CLAMPCOLOR(x) (uchar)((x)<(0)?(0):((x)>(255)?(255):(x)))
+
+#define MMAX(A,B)     ((A)>(B)?(A):(B))
+#define MMIN(A,B)     ((A)<(B)?(A):(B))
+
+static int ConstBlend_Buffer = 0;
+static int ConstBlend_Normal=ConstBlend_Buffer+1;
+static int ConstBlend_Lighten=ConstBlend_Buffer+2;
+static int  ConstBlend_Darken=ConstBlend_Buffer+3;
+static int  ConstBlend_Multiply=ConstBlend_Buffer+4;
+static int  ConstBlend_Average=ConstBlend_Buffer+5;
+
+static int  ConstBlend_Add=ConstBlend_Buffer+6;
+static int  ConstBlend_Subtract=ConstBlend_Buffer+7;
+static int  ConstBlend_Difference=ConstBlend_Buffer+8;
+static int  ConstBlend_Negation=ConstBlend_Buffer+9;
+static int  ConstBlend_Screen=ConstBlend_Buffer+10;
+static int  ConstBlend_Exclusion=ConstBlend_Buffer+11;
+static int  ConstBlend_Overlay=ConstBlend_Buffer+12;
+static int  ConstBlend_SoftLight=ConstBlend_Buffer+13;
+static int  ConstBlend_HardLight=ConstBlend_Buffer+14;
+static int  ConstBlend_ColorDodge=ConstBlend_Buffer+15;
+static int  ConstBlend_ColorBurn=ConstBlend_Buffer+16;
+static int  ConstBlend_LinearDodge=ConstBlend_Buffer+17;
+static int  ConstBlend_LinearBurn=ConstBlend_Buffer+18;
+static int  ConstBlend_LinearLight=ConstBlend_Buffer+19;
+static int  ConstBlend_VividLight=ConstBlend_Buffer+20;
+static int  ConstBlend_PinLight=ConstBlend_Buffer+21;
+static int  ConstBlend_HardMix=ConstBlend_Buffer+22;
+static int  ConstBlend_Reflect=ConstBlend_Buffer+23;
+static int  ConstBlend_Glow=ConstBlend_Buffer+24;
+static int  ConstBlend_Phoenix=ConstBlend_Buffer+25;
+
+//void BlendGram(CBitmap* immage,CBitmap* mask,int mode);
+
+//#typedef unsigned char uint8
+#define uint8 unsigned char
+#define float64 double
+#define TRUE 1
+#define FALSE 0
+
+inline uint8 mmin(uint8 A,uint8 B){
+	return A<B?A:B;
+}
+inline uint8 mmax(uint8 A,uint8 B){
+	return A>B?A:B;
+}
+
+#define ChannelBlend_Normal(A,B)     ((uint8)(A))
+#define ChannelBlend_Lighten(A,B)    ((uint8)((B > A) ? B:A))
+#define ChannelBlend_Darken(A,B)     ((uint8)((B > A) ? A:B))
+#define ChannelBlend_Multiply(A,B)   ((uint8)((A * B) / 255))
+#define ChannelBlend_Average(A,B)    ((uint8)((A + B) / 2))
+#define ChannelBlend_Add(A,B)        ((uint8)(mmin(255, (A + B))))
+#define ChannelBlend_Subtract(A,B)   ((uint8)((A + B < 255) ? 0:(A + B - 255)))
+#define ChannelBlend_Difference(A,B) ((uint8)(abs(A - B)))
+#define ChannelBlend_Negation(A,B)   ((uint8)(255 - abs(255 - A - B)))
+#define ChannelBlend_Screen(A,B)     ((uint8)(255 - (((255 - A) * (255 - B)) >> 8)))
+#define ChannelBlend_Exclusion(A,B)  ((uint8)(A + B - 2 * A * B / 255))
+#define ChannelBlend_Overlay(A,B)    ((uint8)((B < 128) ? (2 * A * B / 255):(255 - 2 * (255 - A) * (255 - B) / 255)))
+#define ChannelBlend_SoftLight(A,B)  ((uint8)((B < 128)?(2*((A>>1)+64))*((float)B/255):(255-(2*(255-((A>>1)+64))*(float)(255-B)/255))))
+#define ChannelBlend_HardLight(A,B)  (ChannelBlend_Overlay(B,A))
+#define ChannelBlend_ColorDodge(A,B) ((uint8)((B == 255) ? B:mmin(255, ((A << 8 ) / (255 - B)))))
+#define ChannelBlend_ColorBurn(A,B)  ((uint8)((B == 0) ? B:mmax(0, (255 - ((255 - A) << 8 ) / B))))
+#define ChannelBlend_LinearDodge(A,B)(ChannelBlend_Add(A,B))
+#define ChannelBlend_LinearBurn(A,B) (ChannelBlend_Subtract(A,B))
+#define ChannelBlend_LinearLight(A,B)((uint8)(B < 128)?ChannelBlend_LinearBurn(A,(2 * B)):ChannelBlend_LinearDodge(A,(2 * (B - 128))))
+#define ChannelBlend_VividLight(A,B) ((uint8)(B < 128)?ChannelBlend_ColorBurn(A,(2 * B)):ChannelBlend_ColorDodge(A,(2 * (B - 128))))
+#define ChannelBlend_PinLight(A,B)   ((uint8)(B < 128)?ChannelBlend_Darken(A,(2 * B)):ChannelBlend_Lighten(A,(2 * (B - 128))))
+#define ChannelBlend_HardMix(A,B)    ((uint8)((ChannelBlend_VividLight(A,B) < 128) ? 0:255))
+#define ChannelBlend_Reflect(A,B)    ((uint8)((B == 255) ? B:mmin(255, (A * A / (255 - B)))))
+#define ChannelBlend_Glow(A,B)       (ChannelBlend_Reflect(B,A))
+#define ChannelBlend_Phoenix(A,B)    ((uint8)(mmin(A,B) - mmax(A,B) + 255))
+#define ChannelBlend_SoftEx(A,B)    (A*B/255+A*(255-((255-A)*(255-B)/255)-A*B/255)/255)
+
+#define ChannelBlend_Alpha(A,B,O)    ((uint8)(O * A + (1 - O) * B))
+#define ChannelBlend_AlphaEx(A,B,O)    ((uint8)((O * A + (255 - O) * B)/255))
+#define ChannelBlend_AlphaF(A,B,F,O) (ChannelBlend_AlphaEx(F(A,B),A,O))
+
+#define ColorBlend_Alpha(T,A,B,O)      (T)[0] = ChannelBlend_AlphaEx((A)[0], (B)[0],O), (T)[1] = ChannelBlend_AlphaEx((A)[1], (B)[1],O), (T)[2] = ChannelBlend_AlphaEx((A)[2], (B)[2],O)
+//, (T)[3] = ChannelBlend_AlphaEx((A)[3], (B)[3],O)
+#define ColorBlend_AlphaF(T,A,B,F,O)      (T)[0] = ChannelBlend_AlphaF((A)[0], (B)[0],F,O), (T)[1] = ChannelBlend_AlphaF((A)[1], (B)[1],F,O), (T)[2] = ChannelBlend_AlphaF((A)[2], (B    )[2],F,O) , (T)[3] = ChannelBlend_AlphaEx((A)[3], (B)[3],O)
+
+
+#define ColorBlend_Buffer(T,A,B,M)      (T)[0] = ChannelBlend_##M((A)[0], (B)[0]), (T)[1] = ChannelBlend_##M((A)[1], (B)[1]), (T)[2] = ChannelBlend_##M((A)[2], (B)[2])
+
+#define ColorBlend_Normal(T,A,B)        (ColorBlend_Buffer(T,A,B,Normal))
+#define ColorBlend_Lighten(T,A,B)       (ColorBlend_Buffer(T,A,B,Lighten))
+#define ColorBlend_Darken(T,A,B)        (ColorBlend_Buffer(T,A,B,Darken))
+#define ColorBlend_Multiply(T,A,B)      (ColorBlend_Buffer(T,A,B,Multiply))
+#define ColorBlend_Average(T,A,B)       (ColorBlend_Buffer(T,A,B,Average))
+#define ColorBlend_Add(T,A,B)           (ColorBlend_Buffer(T,A,B,Add))
+#define ColorBlend_Subtract(T,A,B)      (ColorBlend_Buffer(T,A,B,Subtract))
+#define ColorBlend_Difference(T,A,B)    (ColorBlend_Buffer(T,A,B,Difference))
+#define ColorBlend_Negation(T,A,B)      (ColorBlend_Buffer(T,A,B,Negation))
+#define ColorBlend_Screen(T,A,B)        (ColorBlend_Buffer(T,A,B,Screen))
+#define ColorBlend_Exclusion(T,A,B)     (ColorBlend_Buffer(T,A,B,Exclusion))
+#define ColorBlend_Overlay(T,A,B)       (ColorBlend_Buffer(T,A,B,Overlay))
+#define ColorBlend_SoftLight(T,A,B)     (ColorBlend_Buffer(T,A,B,SoftLight))
+#define ColorBlend_HardLight(T,A,B)     (ColorBlend_Buffer(T,A,B,HardLight))
+#define ColorBlend_ColorDodge(T,A,B)    (ColorBlend_Buffer(T,A,B,ColorDodge))
+#define ColorBlend_ColorBurn(T,A,B)     (ColorBlend_Buffer(T,A,B,ColorBurn))
+#define ColorBlend_LinearDodge(T,A,B)   (ColorBlend_Buffer(T,A,B,LinearDodge))
+#define ColorBlend_LinearBurn(T,A,B)    (ColorBlend_Buffer(T,A,B,LinearBurn))
+#define ColorBlend_LinearLight(T,A,B)   (ColorBlend_Buffer(T,A,B,LinearLight))
+#define ColorBlend_VividLight(T,A,B)    (ColorBlend_Buffer(T,A,B,VividLight))
+#define ColorBlend_PinLight(T,A,B)      (ColorBlend_Buffer(T,A,B,PinLight))
+#define ColorBlend_HardMix(T,A,B)       (ColorBlend_Buffer(T,A,B,HardMix))
+#define ColorBlend_Reflect(T,A,B)       (ColorBlend_Buffer(T,A,B,Reflect))
+#define ColorBlend_Glow(T,A,B)          (ColorBlend_Buffer(T,A,B,Glow))
+#define ColorBlend_Phoenix(T,A,B)       (ColorBlend_Buffer(T,A,B,Phoenix))
+
+
+#define ColorBlend_Hue(T,B,L)            ColorBlend_Hls(T,B,L,HueL,LuminationB,SaturationB)
+#define ColorBlend_Saturation(T,B,L)     ColorBlend_Hls(T,B,L,HueB,LuminationB,SaturationL)
+#define ColorBlend_Color(T,B,L)          ColorBlend_Hls(T,B,L,HueL,LuminationB,SaturationL)
+#define ColorBlend_Luminosity(T,B,L)     ColorBlend_Hls(T,B,L,HueB,LuminationL,SaturationB)
+
+
+
+#define ColorBlend_Hls(T,B,L,O1,O2,O3) {                                        \
+    float64 HueB, LuminationB, SaturationB;                                     \
+    float64 HueL, LuminationL, SaturationL;                                     \
+    Color_RgbToHls((B)[2],(B)[1],(B)[0], &HueB, &LuminationB, &SaturationB);    \
+    Color_RgbToHls((L)[2],(L)[1],(L)[0], &HueL, &LuminationL, &SaturationL);    \
+    Color_HlsToRgb(O1,O2,O3,&(T)[2],&(T)[1],&(T)[0]);                           \
+    }
+
+
+/*********************************************************************/
+
+#define COLOR_OPAQUE                (0)
+#define COLOR_TRANSPARENT           (127)
+
+#define RGB_SIZE                    (3)
+#define RGB_BPP                     (24)
+#define RGB_MAXRED                  (255)
+#define RGB_MAXGREEN                (255)
+#define RGB_MAXBLUE                 (255)
+
+#define ARGB_SIZE                   (4)
+#define ARGB_BPP                    (32)
+#define ARGB_MAXALPHA               (127)
+#define ARGB_MAXRED                 (RGB_MAXRED)
+#define ARGB_MAXGREEN               (RGB_MAXGREEN)
+#define ARGB_MAXBLUE                (RGB_MAXBLUE)
+
+/*********************************************************************/
+
+#define Color_GetChannel(c,shift)   ((uint8)((c) >> (shift)))
+#define Color_Reverse(c,bpp)        ((((uint8)(c) << 24) | ((uint8)((c) >> 8 ) << 16) | ((uint8)((c) >> 16) << 8 ) | \ ((uint8)((c) >> 24))) >> (32 - (bpp)))
+
+#define Rgb_ByteWidth(width)        ((width) * RGB_SIZE)
+#define Rgb_PixelWidth(width)       ((width) / RGB_SIZE)
+
+#define Rgb_GetRed(rgb)             (Color_GetChannel(rgb, 0))
+#define Rgb_GetGreen(rgb)           (Color_GetChannel(rgb, 8))
+#define Rgb_GetBlue(rgb)            (Color_GetChannel(rgb, 16))
+
+#define Rgba_GetRed(rgba)           (Color_GetChannel(rgba, 24))
+#define Rgba_GetGreen(rgba)         (Color_GetChannel(rgba, 16))
+#define Rgba_GetBlue(rgba)          (Color_GetChannel(rgba, 8))
+#define Rgba_GetAlpha(rgba)         (Color_GetChannel(rgba, 0))
+
+#define Argb_GetAlpha(argb)         (Color_GetChannel(argb, 24))
+#define Argb_GetRed(argb)           (Color_GetChannel(argb, 16))
+#define Argb_GetGreen(argb)         (Color_GetChannel(argb, 8))
+#define Argb_GetBlue(argb)          (Color_GetChannel(argb, 0))
+
+#define MakeRgb(r,g,b)              (((uint32)(uint8)(b) << 16) | ((uint16)(uint8)(g) << 8 ) | (uint8)(r))
+#define MakeRgba(r,g,b,a)           (((uint32)(uint8)(r) << 24) | ((uint16)(uint8)(g) << 16) | ((uint16)(uint8)(b) << 8 ) | (uint8)(a))
+#define MakeArgb(a,r,g,b)           (((uint32)(uint8)(a) << 24) | ((uint32)(uint8)(r) << 16) | ((uint16)(uint8)(g) << 8 ) | (uint8)(b))
+#define HexToRgb(hex)               (MakeRgb(((hex & 0xFF0000) >> 16), ((hex & 0x00FF00) >> 8 ), (hex & 0xFF)))
+
+inline int Color_HueToRgb(float64 M1, float64 M2, float64 Hue, float64 *Channel)
+{
+    if (Hue < 0.0)
+        Hue += 1.0;
+    else if (Hue > 1.0)
+        Hue -= 1.0;
+
+    if ((6.0 * Hue) < 1.0)
+        *Channel = (M1 + (M2 - M1) * Hue * 6.0);
+    else if ((2.0 * Hue) < 1.0)
+        *Channel = (M2);
+    else if ((3.0 * Hue) < 2.0)
+        *Channel = (M1 + (M2 - M1) * ((2.0F / 3.0F) - Hue) * 6.0);
+    else
+        *Channel = (M1);
+
+    return TRUE;
+}
+
+inline void Color_RgbToHls(uint8 Red, uint8 Green, uint8 Blue, float64 *Hue, float64 *Lumination, float64 *Saturation)
+{
+    float64 Delta;
+    float64 Max, Min;
+    float64 Redf, Greenf, Bluef;
+
+    Redf    = (float64)Red   / 255.0;
+    Greenf  = (float64)Green / 255.0;
+    Bluef   = (float64)Blue  / 255.0;
+
+    //Max     = fmax(fmax(Redf, Greenf), Bluef);
+    //Min     = fmin(fmin(Redf, Greenf), Bluef);
+    Max     = MMAX(MMAX(Red, Green), Blue)/255.0;
+    Min     = MMIN(MMIN(Red, Green), Blue)/255.0;
+
+    *Hue        = 0;
+    *Lumination = (Max + Min) / 2.0F;
+    *Saturation = 0;
+
+    if (Max == Min)
+        return ;
+
+    Delta = (Max - Min);
+
+    if (*Lumination < 0.5)
+        *Saturation = Delta / (Max + Min);
+    else
+        *Saturation = Delta / (2.0 - Max - Min);
+
+    if (Redf == Max)
+        *Hue = (Greenf - Bluef) / Delta;
+    else if (Greenf == Max)
+        *Hue = 2.0 + (Bluef - Redf) / Delta;
+    else
+        *Hue = 4.0 + (Redf - Greenf) / Delta;
+
+    *Hue /= 6.0;
+
+    if (*Hue < 0.0)
+        *Hue += 1.0;
+
+}
+
+inline void Color_HlsToRgb(float64 Hue, float64 Lumination, float64 Saturation, uint8 *Red, uint8 *Green, uint8 *Blue)
+{
+    float64 M1, M2;
+    float64 Redf, Greenf, Bluef;
+
+    if (Saturation == 0) {
+        Redf    = Lumination;
+        Greenf  = Lumination;
+        Bluef   = Lumination;
+    } else {
+        if (Lumination <= 0.5)
+            M2 = Lumination * (1.0 + Saturation);
+        else
+            M2 = Lumination + Saturation - Lumination * Saturation;
+
+        M1 = (2.0 * Lumination - M2);
+
+        Color_HueToRgb(M1, M2, Hue + (1.0F / 3.0F), &Redf);
+        Color_HueToRgb(M1, M2, Hue, &Greenf);
+        Color_HueToRgb(M1, M2, Hue - (1.0F / 3.0F), &Bluef);
+    }
+
+    *Red    = (uint8)(Redf * 255);
+    *Blue   = (uint8)(Bluef * 255);
+    *Green  = (uint8)(Greenf * 255);
+
+}
+
+void BlendGramSimp(unsigned char *Src,unsigned char* Mask, unsigned char *Dest, int Width, int Height, int Mode);
+void BlendGramAlpha(unsigned char *Src,unsigned char* Mask, unsigned char *Dest, int Width, int Height);
+void BlendGramAlpha3(unsigned char *Src,unsigned char* Mask, unsigned char *Dest, int Width, int Height);
+void BlendGramAlphaRev(unsigned char *Src,unsigned char* Mask, unsigned char *Dest, int Width, int Height);
+/*
+void BlendImageAdjustWithMask(CBitmap* bmp,CBitmap* adj,CBitmap* dst ,CBitmap* msk,int mode);
+void BlendImageAdjustWithMaskEx(CBitmap* bmp,CBitmap* adj,CBitmap* dst ,CBitmap* msk,int mode);
+void BlendImageAdjustWithAlpha(CBitmap* bmp,CBitmap* adj,CBitmap* dst ,int alpha,int mode);
+void BlendImageAdjustWithAlphaMask(CBitmap* bmp,CBitmap* adj,CBitmap* dst ,CBitmap* msk,int alpha,int mode);
+
+void CheckAlpha(CBitmap* bmp,CBitmap* alpha);
+void ReadAlphaBySrc(CBitmap* src,CBitmap* alpha);
+*/
+
+#endif
--- a/duix-sdk/src/main/cpp/dhunet/face_utils.cpp
+++ b/duix-sdk/src/main/cpp/dhunet/face_utils.cpp
@ -0,0 +1,133 @@
+#include "face_utils.h"
+//#include <sys/timeb.h>
+
+/*
+cv::Mat resize_image(cv::Mat srcimg, int height, int width, int* top, int* left){
+    cv::Mat dstimg;
+    int srch = srcimg.rows, srcw = srcimg.cols;
+    int neww = width;
+    int newh = height;
+    if (srch != srcw) {
+        float hw_scale = (float)srch / srcw;
+        if (hw_scale > 1) {
+            newh = height;
+            neww = int(width / hw_scale);
+            cv::resize(srcimg, dstimg, cv::Size(neww, newh), cv::INTER_AREA);
+            *left = int((width - neww) * 0.5);
+            cv::copyMakeBorder(dstimg, dstimg, 0, 0, *left, width - neww - *left, cv::BORDER_CONSTANT, 0);
+        }
+        else
+        {
+            newh = (int)height * hw_scale;
+            neww = width;
+            cv::resize(srcimg, dstimg,cv::Size(neww, newh), cv::INTER_AREA);
+            *top = (int)(height - newh) * 0.5;
+            cv::copyMakeBorder(dstimg, dstimg, *top, height - newh - *top, 0, 0, cv::BORDER_CONSTANT, 0);
+
+        }
+    } else {
+        cv::resize(srcimg, dstimg, cv::Size(neww, newh), cv::INTER_AREA);
+    }
+    return dstimg;
+}
+*/
+
+
+int dumpfile(char* file,char** pbuf){
+    std::string fname(file);
+    std::ifstream cache(fname,std::ios::binary);
+    cache.seekg(0,std::ios::end);
+    const int engSize = cache.tellg();
+    printf("===engsize %d\n",engSize );
+    cache.seekg(0,std::ios::beg);
+    char *modelMem = (char*)malloc(engSize+8000);
+    cache.read(modelMem,engSize);
+    cache.close();
+    *pbuf = modelMem;
+    return engSize;
+}
+
+void dumpchar(char* abuf,int len){
+    uint8_t* buf = (uint8_t*)abuf;
+    printf("\n----------------------chardump------------------------\n");
+    int i;
+    for(i = 0; i < len; i++) {
+        printf("=%u=", buf[i]);
+        if( (i+1) % 16 == 0) {
+            printf("\n");
+        }
+    }
+    if(i%16 != 0) {
+        printf("\n");
+    }
+    printf("\n----------------------chardump------------------------\n");
+}
+
+
+void dumpfloat(float* abuf,int len){
+    printf("\n----------------------floatdump------------------------\n");
+    int i;
+    for(i = 0; i < len; i++) {
+        printf("=%f=", abuf[i]);
+        if( (i+1) % 16 == 0) {
+            printf("\n");
+        }
+    }
+    if(i%16 != 0) {
+        printf("\n");
+    }
+    printf("\n----------------------floatdump------------------------\n");
+}
+
+void dumpshort(short* abuf,int len){
+    printf("\n----------------------floatdump------------------------\n");
+    int i;
+    for(i = 0; i < len; i++) {
+        printf("=%d=", abuf[i]);
+        if( (i+1) % 16 == 0) {
+            printf("\n");
+        }
+    }
+    if(i%16 != 0) {
+        printf("\n");
+    }
+    printf("\n----------------------floatdump------------------------\n");
+}
+
+
+void dumphex(char* abuf,int len){
+    unsigned char* buf = (unsigned char*)abuf;
+    int i = 0;
+    printf("\n----------------------hexdump------------------------\n");
+    for(i = 0; i < len; i++) {
+        printf("=%02x=", buf[i]);
+        if( (i+1) % 16 == 0) {
+            printf("\n");
+        }
+    }
+    if(i%16 != 0) {
+        printf("\n");
+    }
+    printf("---------------------hexdump-------------------------\n\n");
+}
+
+int diffbuf(char* abuf,char* bbuf,int size){
+    char* pa = abuf;
+    char* pb = bbuf;
+    int diff = 0;
+    for(int k= 0;k<size;k++){
+        if(*pa++==*pb++){
+        }else{
+            diff++;
+        }
+    }
+    return diff;
+}
+
+
+uint64_t aitimer_msstamp() {
+    struct timespec ts;
+    clock_gettime(CLOCK_MONOTONIC, &ts);
+    return (ts.tv_sec*1000l) + (ts.tv_nsec/CLOCKS_PER_SEC);
+}
+
--- a/duix-sdk/src/main/cpp/dhunet/face_utils.h
+++ b/duix-sdk/src/main/cpp/dhunet/face_utils.h
@ -0,0 +1,21 @@
+#pragma once
+#include <stdint.h>
+#include <fstream>
+#include <sstream>
+#include <iostream>
+#include <vector>
+//#include <opencv2/dnn.hpp>
+//#include <opencv2/imgproc.hpp>
+//#include <opencv2/highgui.hpp>
+
+
+void dumpchar(char* abuf,int len);
+void dumphex(char* abuf,int len);
+void dumpshort(short* abuf,int len);
+void dumpfloat(float* abuf,int len);
+void dumpdouble(double* abuf,int len);
+int dumpfile(char* file,char** pbuf);
+int diffbuf(char* abuf,char* bbuf,int size);
+
+uint64_t aitimer_msstamp();
+
--- a/duix-sdk/src/main/cpp/dhunet/jmat.cpp
+++ b/duix-sdk/src/main/cpp/dhunet/jmat.cpp
@ -0,0 +1,507 @@
+#include "jmat.h"
+
+extern "C"{
+#pragma pack(push)
+#pragma pack(4)
+
+  typedef struct _gpg_hdr {
+    char        head[4];
+    int         box[4];
+    int         size[4];
+    int         width[4];
+    int         height[4];
+    uint8_t     channel[4];
+    uint8_t     bit[4];
+  }gpg_hdr;
+#pragma pack(pop)
+}
+
+
+int JBuf::zeros(uint8_t val){
+  memset(m_buf,val,m_size);
+  return m_size;
+}
+
+int  JBuf::copyto(JBuf* dst){
+  if(m_size!=dst->m_size)return -1;
+  memcpy(dst->m_buf,m_buf,m_size);
+  return m_size;
+}
+
+int  JBuf::copyfrom(JBuf* src){
+  if(m_size!=src->m_size)return -1;
+  memcpy(m_buf,src->m_buf,src->m_size);
+  return m_size;
+}
+
+
+int JBuf::forceref(int bref){
+  //if(m_ref!=bref){
+  m_ref = bref;
+  //}
+  return 0;
+}
+
+JBuf::JBuf(uint32_t size,void* buf ){
+  if(buf){
+    m_ref = true;
+    m_buf = buf;
+    m_size = size;
+  }else{
+    m_ref = false;
+    m_size = size;
+    m_buf = malloc(size+1024);
+  }
+}
+
+JBuf::~JBuf(){
+  //printf("====%d free %p\n",m_ref,m_buf);
+  if(!m_ref){
+    free(m_buf);
+    m_buf = nullptr;
+  }
+}
+
+JBuf::JBuf(){
+  m_size = 0;
+  m_buf = nullptr;
+}
+
+JMat::JMat(){
+  init_tagarr();
+}
+
+void JMat::init_tagarr(){
+  memset(m_tagarr,0,512*sizeof(int));
+}
+
+int* JMat::tagarr(){
+  return m_tagarr;
+}
+
+int JMat::savegpg(std::string gpgfile){
+  gpg_hdr ghead;
+  memset(&ghead,0,sizeof(gpg_hdr));
+  ghead.head[0]='g';
+  ghead.head[1]='p';
+  ghead.head[2]='g';
+  ghead.head[3]='1';
+  ghead.size[0]=m_size;
+  ghead.width[0]=m_width;
+  ghead.height[0]=m_height;
+  ghead.channel[0]=m_channel;
+  ghead.bit[0]=m_bit;
+
+  FILE *gpgFile = NULL;
+  const char* fn = gpgfile.c_str();
+  if ((gpgFile = fopen(fn, "wb")) == NULL)return -1;
+  fwrite(&ghead,sizeof(gpg_hdr),1,gpgFile);
+  fwrite(m_buf, m_size, 1, gpgFile);
+  fclose(gpgFile);
+  return 0;
+}
+
+int JMat::load(std::string picfile,int flag){
+  const char* fn = picfile.c_str();
+  size_t len = strlen(fn);
+  if(len<4)return -1;
+  fn+= len-3;
+  int gpg = (fn[0]=='g')&&(fn[1]=='p')&&(fn[2]=='g');
+  if(gpg){
+    return loadgpg(picfile);
+  }else{
+    return loadjpg(picfile);
+  }
+
+}
+
+int JMat::loadgpg(std::string gpgfile){
+  FILE *gpgFile = NULL;
+  const char* fn = gpgfile.c_str();
+  if ((gpgFile = fopen(fn, "rb")) == NULL)return -1;
+  int rst = 0;
+  while(1){
+    gpg_hdr ghead;
+    memset(&ghead,0,sizeof(gpg_hdr));
+    fread(&ghead,sizeof(gpg_hdr),1,gpgFile);
+    char* arr=ghead.head;
+    if((arr[0]=='g')&&
+        (arr[1]=='p')&&
+        (arr[2]=='g')){
+
+      size_t imgSize  = ghead.size[0];
+      if(m_size<imgSize){
+        //printf("==m_size %d img size %d\n",m_size,imgSize);
+        if((!m_ref)&&m_buf)free(m_buf);
+        m_buf = malloc(imgSize);
+        m_ref = 0;
+      }
+      m_size = imgSize;
+      m_width = ghead.width[0];
+      m_height = ghead.height[0];
+      m_channel = ghead.channel[0];
+      m_bit = ghead.bit[0];
+      fread(m_buf, m_size, 1, gpgFile);
+    }else{
+      rst = -11;
+    }
+    break;
+  }
+  fclose(gpgFile);
+  return rst;
+}
+
+#ifdef USE_TURBOJPG
+#include "turbojpeg.h"
+int JMat::loadjpg(std::string picfile,int flag){
+  tjhandle tjInstance = NULL;
+  int rst = 0;
+  size_t jpegSize = 0;
+  size_t imgSize = 0;
+  int newbuf = 0;
+  unsigned char *jpegBuf = NULL;
+  if(1){
+    long size;
+    FILE *jpegFile = NULL;
+    const char* fn = picfile.c_str();
+    if ((jpegFile = fopen(fn, "rb")) == NULL)return -1;
+    if (fseek(jpegFile, 0, SEEK_END) < 0 || ((size = ftell(jpegFile)) < 0) || (fseek(jpegFile, 0, SEEK_SET) < 0)){
+      fclose(jpegFile);
+      return -2;
+    }
+    if (size == 0){
+      fclose(jpegFile);
+      return -3;
+    }
+    jpegSize = size;
+    jpegBuf = (unsigned char*)tj3Alloc(jpegSize);
+    fread(jpegBuf, jpegSize, 1, jpegFile);
+    fclose(jpegFile);
+  }
+  if ((tjInstance = tj3Init(TJINIT_DECOMPRESS)) == NULL)return -11;
+  while(1){
+    unsigned char *imgBuf = NULL;
+    int w, h;
+    int inSubsamp, inColorspace;
+    int pixelFormat = TJPF_BGR;
+    rst = tj3DecompressHeader(tjInstance, jpegBuf, jpegSize);
+    if(rst<0){
+      rst = -12;
+      break;
+    }
+    w = tj3Get(tjInstance, TJPARAM_JPEGWIDTH);
+    h = tj3Get(tjInstance, TJPARAM_JPEGHEIGHT);
+    inSubsamp = tj3Get(tjInstance, TJPARAM_SUBSAMP);
+    inColorspace = tj3Get(tjInstance, TJPARAM_COLORSPACE);
+    imgSize = w * h * tjPixelSize[pixelFormat];
+    if(imgSize <0){
+      rst = -13;
+      break;
+    }
+    //printf("===imgSize %d m_size %d\n",imgSize,m_size);
+    if(m_size<imgSize){
+      if((!m_ref)&&m_buf)free(m_buf);
+      m_buf = malloc(imgSize);
+      m_ref = 0;
+    }
+    m_size = imgSize;
+    imgBuf = (unsigned char *)m_buf;
+    if(tj3Decompress8(tjInstance, jpegBuf, jpegSize, imgBuf, 0, pixelFormat) < 0){
+      rst = -15;
+      break;
+    }
+    //m_ref = 0;
+    m_bit = 1;
+    m_channel = 3;
+    m_stride = w*3;
+    m_width = w;
+    m_height = h;
+    break;
+  }
+  if(jpegBuf)tj3Free(jpegBuf);
+  jpegBuf = NULL;
+  tj3Destroy(tjInstance);
+  tjInstance = NULL;
+  return rst;
+}
+
+#else
+int JMat::loadjpg(std::string picfile,int flag){
+  return -1;
+}
+#endif
+
+JMat::JMat(int w,int h,float *buf ,int c  ,int d ):JBuf(){
+  m_bit = sizeof(float);
+  m_width = w;
+  m_height = h;
+  m_channel = c;
+  m_stride = d?d:w*c;
+  m_size = m_bit*m_stride*m_height;
+  m_buf = buf;
+
+  m_ref = 1;
+  init_tagarr();
+}
+
+JMat::JMat(int w,int h,uint8_t *buf ,int c ,int d ):JBuf(){
+  m_bit = 1;
+  m_width = w;
+  m_height = h;
+  m_channel = c;
+  m_stride = d?d:w*c;
+  m_size = m_bit*m_stride*m_height;
+  //printf("===d %d stride %d width %d height %d m_size %d\n",d,m_stride,w,h,m_size);
+  m_buf = buf;
+  m_ref = 1;
+  init_tagarr();
+}
+
+JMat::JMat(int w,int h,int c ,int d ,int b):JBuf(){
+  m_bit = b==0?sizeof(float):b;
+  m_width = w;
+  m_height = h;
+  m_channel = c;
+  m_stride = d?d:w*c;
+  m_size = m_bit*m_stride*m_height;
+  //printf("===mat %d size %d\n",m_bit,m_size);
+  m_buf = malloc(m_size+m_bit*m_stride);
+  memset(m_buf,0,m_size+m_bit*m_stride);
+  m_ref = 0;
+  init_tagarr();
+}
+
+#ifdef USE_OPENCV
+
+cv::Mat  JMat::cvmat(){
+  if(m_channel == 3){
+    cv::Mat rrr(m_height,m_width,m_bit==1?CV_8UC3:CV_32FC3,m_buf);
+    return rrr;
+  }else if(m_channel == 1){
+    cv::Mat rrr(m_height,m_width,m_bit==1?CV_8UC1:CV_32FC1,m_buf);
+    return rrr;
+  }else{
+    cv::Mat rrr(m_height,m_width*m_channel,m_bit==1?CV_8UC1:CV_32FC1,m_buf);
+    return rrr;
+  }
+}
+
+int JMat::show(const char* title,int inx){
+  std::string name(title);
+  //printf("===show m_bit %d\n",m_bit);
+  if(m_bit==1){
+    cv::Mat mat(m_height,m_width,m_channel==3?CV_8UC3:CV_8UC1,m_buf);
+    if(inx){
+      std::string str = std::to_string(inx);
+      cv::Point pt;
+      pt.x = 180;
+      pt.y = 450;
+      int baseline = 0;
+      cv::putText(mat,str,pt,0,2,cv::Scalar(0,255,0),4,8.0);
+    }
+    cv::imshow(name,mat);
+  }else{
+    cv::Mat mat(m_height,m_width,m_channel==3?CV_32FC3:CV_32FC1,m_buf);
+    cv::imshow(name,mat);
+  }
+  return 0;
+}
+int JMat::tojpg(const char* fn){
+  int rst = 0;
+  if(m_bit==1){
+    cv::Mat mat(m_height,m_width,CV_8UC3,m_buf);
+    std::string name(fn);
+    rst = cv::imwrite(name,mat);
+  }else{
+    printf("====ccc\n");
+    cv::Mat mat(m_height,m_width,CV_32FC3,m_buf);
+    cv::Mat dst(m_height,m_width,CV_8UC3);
+    mat.convertTo(dst,CV_8UC3,255.0f);
+    std::string name(fn);
+    rst = cv::imwrite(name,dst);
+  }
+  return rst;
+}
+#else
+int JMat::show(const char* title,int tag){
+  return 0;
+}
+int JMat::tojpg(const char* fn){
+  return 0;
+}
+#endif
+
+
+int JMat::tobin(const char* fn){
+  FILE* file = fopen(fn, "w");
+  if(!file)return 0;
+  fwrite(m_buf, m_size, 1, file);
+  fclose(file);
+  return 1;
+}
+
+JMat* JMat::reshape(int w,int h,int l,int t,int c){
+  int allh = h+t;
+  if(allh>m_height)return NULL;
+  int channel = c?c:m_channel;
+  JMat* mat = NULL;
+  if(m_bit==1){
+    uint8_t* buf = udata()+t*m_stride+l*m_channel;
+    mat= new JMat(w,h,buf,channel,m_stride);
+  }else{
+    float* buf = fdata()+t*m_stride+l*m_channel;
+    mat= new JMat(w,h,buf,channel,m_stride);
+  }
+  return mat;
+}
+
+JMat* JMat::refclone(int ref){
+  if(ref){
+    if(m_bit==1){
+      return new JMat(m_width,m_height,(uint8_t*)m_buf,m_channel,m_stride);
+    }else{
+      return new JMat(m_width,m_height,(float*)m_buf,m_channel,m_stride);
+    }
+  }else{
+    JMat* cm = new JMat(m_width,m_height,m_channel,m_stride,m_bit);
+    //printf("m_buf %p m_size %d\n",m_buf,m_size);
+    //printf("====w %d h %d c %d s %d refclone %d\n",m_width,m_height,m_channel,m_stride,ref);
+    memcpy(cm->m_buf,m_buf,m_size);
+    memcpy(cm->m_tagarr,m_tagarr,512*sizeof(int));
+    return cm;
+  }
+}
+
+JMat JMat::clone(){
+  JMat cm(m_width,m_height,m_channel,m_stride,m_bit);
+  memcpy(cm.m_buf,m_buf,m_size);
+  memcpy(cm.m_tagarr,m_tagarr,512*sizeof(int));
+  return cm;
+}
+
+#ifdef USE_OPENCV
+JMat::JMat(std::string picfile,int flag):JBuf(){
+  cv::Mat image ;//= cv::imread(picfile);
+  m_bit = flag?1:sizeof(float);
+  m_width = image.cols;
+  m_height = image.rows;
+  m_channel = 3;//image.channels();
+                //printf("===channels %d\n",m_channel);
+  m_stride = m_width*m_channel;
+  m_size = m_bit*m_stride*m_height;
+  m_buf = malloc(m_size+m_bit*m_stride);
+  m_ref = 0;
+  if(flag){
+    memcpy(m_buf,image.data,m_size);
+    //printf("===w %d h %d\n",image.cols,image.rows);
+    //cv::imshow("aaa",image);
+    //cv::waitKey(0);
+    //cv::Mat fmat(m_height,m_width,CV_8UC3,m_buf);
+    //float scale = 1.0f/255.0f;
+    //image.convertTo(fmat,CV_32F,scale);
+  }else{
+    cv::Mat fmat(m_height,m_width,CV_32FC3,m_buf);
+    float scale = 1.0f/255.0f;
+    image.convertTo(fmat,CV_32F,scale);
+  }
+  image.release();
+  init_tagarr();
+}
+#else
+JMat::JMat(std::string picfile,int flag):JBuf(){
+
+}
+#endif
+
+JMat::~JMat(){
+  //printf("====%d free 1 %p\n",m_ref,m_buf);
+}
+
+float* JMat::fdata(){
+  return (float*)m_buf;
+}
+
+float* JMat::frow(int row){
+  return ((float*)m_buf)+ row*m_stride;
+}
+
+char* JMat::row(int row){
+  return ((char*)m_buf)+ row*m_stride*m_bit;
+}
+
+float* JMat::fitem(int row,int col){
+  return ((float*)m_buf)+ row*m_stride + col;
+
+}
+
+void JMat::dump(){
+  printf("jmat %p size %d bit %d===\n",m_buf,m_size,m_bit);
+  printf("w %d h %d c %d s %d \n",m_width,m_height,m_channel,m_stride);
+  if(m_height!=1){
+    if(m_bit==4){
+      for(int k=0;k<m_height;k++){
+        float* buf = frow(k);
+        float* ebuf = buf + m_width -3;
+
+        printf("%d %f %f %f === %f %f %f\n",k,buf[0],buf[1],buf[2],ebuf[0],ebuf[1],ebuf[2]);
+      }
+    }else{
+      for(int k=0;k<m_height;k++){
+        short* buf = (short*)row(k);
+        short* ebuf = buf + m_width -6;
+        printf("%d %hd %hd %hd === %hd %hd %hd\n",k,buf[0],buf[1],buf[2],ebuf[0],ebuf[1],ebuf[2]);
+
+      }
+    }
+  }else{
+    short* buf = (short*)row(0);
+    for(int k=0;k<m_width/640;k++){
+      short* ebuf = buf + 640 -6;
+
+      printf("%d %hd %hd %hd === %hd %hd %hd\n",k,buf[0],buf[1],buf[2],ebuf[0],ebuf[1],ebuf[2]);
+      buf += 640;
+    }
+  }
+}
+
+
+uint8_t* JMat::udata(){
+  return (uint8_t*)m_buf;
+}
+/*
+   nc::NdArray<float> JMat::ncarray(){
+   bool own = false;
+   nc::NdArray<float> arr = nc::NdArray<float>((float*)m_buf, m_height, m_width, own);
+   return arr;
+   }
+   */
+
+
+#ifdef USE_NCNN
+ncnn::Mat JMat::packingmat(){
+  ncnn::Mat in_pack(m_width,m_height,1,(void*)m_buf,(size_t)4u*3,3);
+  ncnn::Mat in ;
+  ncnn::convert_packing(in_pack,in,1);
+  return in;
+}
+
+ncnn::Mat           JMat::ncnnmat(){
+  unsigned char* data = (unsigned char*)m_buf;
+  if(m_channel == 3){
+    ncnn::Mat mat = ncnn::Mat::from_pixels(data, ncnn::Mat::PIXEL_BGR, m_width, m_height);
+    return mat;
+  }else if(m_channel == 4){
+    ncnn::Mat mat = ncnn::Mat::from_pixels(data, ncnn::Mat::PIXEL_BGRA, m_width, m_height);
+    return mat;
+  }else if(m_channel == 1){
+    ncnn::Mat mat = ncnn::Mat::from_pixels(data, ncnn::Mat::PIXEL_GRAY, m_width, m_height);
+    return mat;
+  }else {
+    ncnn::Mat mat = ncnn::Mat::from_pixels(data, ncnn::Mat::PIXEL_GRAY, m_width*m_channel, m_height);
+    return mat;
+  }
+}
+
+
+#endif
+
--- a/duix-sdk/src/main/cpp/dhunet/jmat.h
+++ b/duix-sdk/src/main/cpp/dhunet/jmat.h
@ -0,0 +1,99 @@
+#pragma once
+#include <stdio.h>
+#include <stdlib.h>
+#include <stdio.h>
+#include <memory>
+#include <vector>
+#include <string.h>
+#include <string>
+
+//#include "NumCpp.hpp"
+#define USE_OPENCV
+#define USE_NCNN
+#define USE_TURBOJPG
+//#define USE_PPLCV
+
+#ifdef USE_OPENCV
+#include "opencv2/core.hpp"
+#include "opencv2/imgproc.hpp"
+#include "opencv2/highgui.hpp"
+#endif
+
+#ifdef USE_NCNN
+#include "mat.h"
+#endif
+
+#ifdef USE_EIGEN
+#include "eigen3/Eigen/Core"
+typedef Eigen::Matrix<float, 1, Eigen::Dynamic, Eigen::RowMajor> Vectorf;
+typedef Eigen::Matrix<std::complex<float>, 1, Eigen::Dynamic, Eigen::RowMajor> Vectorcf;
+typedef Eigen::Matrix<float, Eigen::Dynamic, Eigen::Dynamic, Eigen::RowMajor> Matrixf;
+typedef Eigen::Matrix<std::complex<float>, Eigen::Dynamic, Eigen::Dynamic, Eigen::RowMajor> Matrixcf;
+#endif
+
+class JBuf{
+
+    public:
+        bool        m_ref = 0;
+        uint32_t    m_size = 0;
+        void*       m_buf = NULL;
+    public:
+        uint32_t    size(){return m_size;} ;
+        void*       data(){return m_buf;};
+        bool        ref(){return m_ref;};
+        int         zeros(uint8_t val=0);
+        int         copyfrom(JBuf* src);
+        int         copyto(JBuf* dst);
+        int         forceref(int bref);
+        JBuf();
+        JBuf(uint32_t size,void* buf = nullptr);
+        virtual ~JBuf();
+};
+
+class JMat:public JBuf{
+    public:
+        int     m_bit = 0;
+        int     m_width = 0;
+        int     m_height = 0;
+        int     m_channel = 0;
+        int     m_stride = 0;
+        int     m_tagarr[512];
+        void    init_tagarr();
+    public:
+        int height(){return m_height;}
+        int width(){return m_width;}
+        int stride(){return m_stride;}
+        int channel(){return m_channel;}
+        JMat(int w,int h,float *buf ,int c = 3 ,int d = 0);
+        JMat(int w,int h,uint8_t *buf ,int c = 3 ,int d = 0);
+        JMat(int w,int h,int c = 3,int d = 0,int b=0);
+        JMat(std::string picfile,int flag=0);
+        JMat();
+        int load(std::string picfile,int flag=0);
+        int loadjpg(std::string picfile,int flag=0);
+        int savegpg(std::string gpgfile);
+        int loadgpg(std::string gpgfile);
+        float* fdata();
+        char* row(int row);
+        float* frow(int row);
+        float* fitem(int row,int col);
+        int tojpg(const char* fn);
+        int tobin(const char* fn);
+        int show(const char* title,int inx = 0);
+        JMat clone();
+        JMat* refclone(int ref=1);
+        JMat* reshape(int w,int h,int l,int t,int c=0);
+        uint8_t* udata();
+        virtual ~JMat();
+        int*    tagarr();
+        void     dump();
+        //nc::NdArray<float> ncarray();
+#ifdef USE_OPENCV
+        cv::Mat             cvmat();
+#endif
+#ifdef USE_NCNN
+        ncnn::Mat           ncnnmat();
+        ncnn::Mat           packingmat();
+#endif
+        //Matrixf  tomatrix();
+};
--- a/duix-sdk/src/main/cpp/dhunet/malpha.cpp
+++ b/duix-sdk/src/main/cpp/dhunet/malpha.cpp
@ -0,0 +1,184 @@
+#include "malpha.h"
+
+MWorkMat::MWorkMat(JMat* pic,JMat* msk,const int* boxs,int kind){
+    m_boxx = boxs[0];
+    m_boxy=boxs[1];
+    m_boxwidth=boxs[2]-m_boxx;
+    m_boxheight=boxs[3]-m_boxy;
+    //printf("x %d y %d w %d h %d \n",m_boxx,m_boxy,m_boxwidth,m_boxheight);
+    m_pic = pic;
+    m_msk = msk;
+
+
+    if(kind==168){
+
+      srcw = 168;
+      edge = 4;
+      adjw = 160;
+      mskx = 5;
+      msky = 5;
+      mskw = 150;
+      mskh = 145;
+
+    }else if(kind==128){
+      srcw = 134;
+      edge = 3;
+      adjw = 128;
+      mskx = 4;
+      msky = 4;
+      mskw = 120;
+      mskh = 120;
+
+
+    }
+
+    pic_realadjw = new JMat(adjw,adjw,3,0,1);
+    pic_maskadjw = new JMat(adjw,adjw,3,0,1);
+    //pic_cropadjw = new JMat(adjw,adjw,3,0,1);
+
+    msk_realadjw = new JMat(adjw,adjw,1,0,1);
+
+}
+
+MWorkMat::~MWorkMat(){
+    matpic_orgsrcw.release();
+    matpic_roirst.release();
+    delete pic_realadjw;
+    delete pic_maskadjw;
+    delete msk_realadjw;
+    if(pic_cloneadjw) delete pic_cloneadjw;
+}
+
+int MWorkMat::munet(JMat** ppic,JMat** pmsk){
+
+    *ppic = pic_realadjw;
+    *pmsk = pic_maskadjw;
+    return 0;
+}
+
+int MWorkMat::premunet(){
+    matpic_roisrc = cv::Mat(m_pic->cvmat(),cv::Rect(m_boxx,m_boxy,m_boxwidth,m_boxheight));
+    cv::resize(matpic_roisrc , matpic_orgsrcw, cv::Size(srcw, srcw), cv::INTER_AREA);
+    matpic_roiadjw = cv::Mat(matpic_orgsrcw,cv::Rect(edge,edge,adjw,adjw));
+    cv::Mat cvmask = pic_maskadjw->cvmat();
+    cv::Mat cvreal = pic_realadjw->cvmat();
+    //printf("===matpic %d %d\n",matpic_roiadjw.cols,matpic_roiadjw.rows);
+    //printf("===cvreal %d %d\n",cvreal.cols,cvreal.rows);
+    //getchar();
+    matpic_roiadjw.copyTo(cvreal);
+    matpic_roiadjw.copyTo(cvmask);
+    pic_cloneadjw = pic_realadjw->refclone(0);
+    cv::rectangle(cvmask,cv::Rect(mskx,msky,mskw,mskh),cv::Scalar(0,0,0),-1);//,cv::LineTypes::FILLED);
+    return 0;
+}
+
+int MWorkMat::finmunet(JMat* fgpic){
+    cv::Mat cvreal = pic_realadjw->cvmat();
+
+        //for(int k=0;k<16;k++){
+            //cv::line(cvreal,cv::Point(0,k*10),cv::Point(adjw,k*10),cv::Scalar(0,255,0));
+        //}
+        //for(int k=0;k<16;k++){
+            //cv::line(cvreal,cv::Point(k*10,0),cv::Point(k*10,adjw),cv::Scalar(0,255,0));
+        //}
+    cvreal.copyTo(matpic_roiadjw);
+    //cv::imwrite("accpre.bmp",matpic_orgsrcw);
+    if(m_msk) vtacc((uint8_t*)matpic_orgsrcw.data,srcw*srcw);
+    //cv::imwrite("accend.bmp",matpic_orgsrcw);
+    if(fgpic&&(fgpic->width()==srcw)){
+      std::vector<cv::Mat> list;
+      cv::split(matpic_orgsrcw,list);
+      matmsk_roisrc = cv::Mat(m_msk->cvmat(),cv::Rect(m_boxx,m_boxy,m_boxwidth,m_boxheight));
+      cv::resize(matmsk_roisrc , matmsk_orgsrcw, cv::Size(srcw, srcw), cv::INTER_AREA);
+      cv::Mat rrr(srcw,srcw,CV_8UC1);
+      cv::cvtColor(matmsk_orgsrcw,rrr,cv::COLOR_RGB2GRAY);
+      list.push_back(rrr);
+      cv::merge(list,fgpic->cvmat());
+    }else{
+      cv::resize(matpic_orgsrcw, matpic_roirst, cv::Size(m_boxwidth, m_boxheight), cv::INTER_AREA);
+      if(fgpic){
+        matpic_roisrc = cv::Mat(fgpic->cvmat(),cv::Rect(m_boxx,m_boxy,m_boxwidth,m_boxheight));
+        matpic_roirst.copyTo(matpic_roisrc);
+      }else{
+        matpic_roirst.copyTo(matpic_roisrc);
+      }
+    }
+    return 0;
+}
+
+int MWorkMat::alpha(JMat** preal,JMat** pimg,JMat** pmsk){
+    *preal = pic_cloneadjw;
+    *pimg =  pic_realadjw;
+    *pmsk =  msk_realadjw;
+    return 0;
+}
+
+int MWorkMat::prealpha(){
+    printf("x %d y %d w %d h %d \n",m_boxx,m_boxy,m_boxwidth,m_boxheight);
+    matmsk_roisrc = cv::Mat(m_msk->cvmat(),cv::Rect(m_boxx,m_boxy,m_boxwidth,m_boxheight));
+    cv::resize(matmsk_roisrc , matmsk_orgsrcw, cv::Size(srcw, srcw), cv::INTER_AREA);
+
+    matmsk_roiadjw = cv::Mat(matmsk_orgsrcw,cv::Rect(edge,edge,adjw,adjw));
+    cv::Mat cvmask = msk_realadjw->cvmat();
+    cv::cvtColor(matmsk_roiadjw,cvmask,cv::COLOR_RGB2GRAY);
+    return 0;
+}
+
+int MWorkMat::finalpha(){
+    cv::Mat cvmask = msk_realadjw->cvmat();
+    cv::cvtColor(cvmask,matmsk_roiadjw,cv::COLOR_GRAY2RGB);
+    //
+    cv::resize(matmsk_orgsrcw, matmsk_roirst, cv::Size(m_boxwidth, m_boxheight), cv::INTER_AREA);
+    matmsk_roirst.copyTo(matmsk_roisrc);
+    return 0;
+}
+
+int MWorkMat::vtacc(uint8_t* buf,int count){
+    /*
+    int avgr = 0;
+    int avgb = 0;
+    int avgg = 0;
+    if(1){
+        uint8_t* pb = m_pic->udata();
+        for(int k=0;k<10;k++){
+            avgr += *pb++;
+            avgg += *pb++;
+            avgb += *pb++;
+        }
+        avgr =avgr/10 +10;
+        avgg =avgg/10 -20;
+        if(avgg<0)avgg=0;
+        avgb =avgb/10 + 10;
+    }
+    */
+    uint8_t* pb = buf;
+    for(int k=0;k<count;k++){
+        int sum  = (pb[0]+ pb[2])/2.0f;
+        if(pb[1]>=sum){
+            pb[1]=sum;
+            //pb[0]=0;
+            //pb[2]=0;
+            // }else if((pb[0]<avgr)&&(pb[1]>avgg)&&(pb[2]<avgb)){
+            //pb[1]=0;
+            //pb[0]=0;
+            //pb[2]=0;
+        }
+        pb+=3;
+    }
+    /*
+    long sum = 0l;
+    float  mean = sum*0.5f/count;
+    uint8_t maxg = (mean>255.f)?255:mean;
+    //printf("sum %ld mean %f maxg %d\n",sum,mean,maxg);
+    //getchar();
+    pb = buf +1;
+    for(int k=0;k<count;k++){
+        if(*pb>maxg){
+            *pb = maxg;
+        }
+        pb+=3;
+    }
+    */
+    return 0;
+}
+
--- a/duix-sdk/src/main/cpp/dhunet/malpha.h
+++ b/duix-sdk/src/main/cpp/dhunet/malpha.h
@ -0,0 +1,56 @@
+#pragma once
+#include "jmat.h"
+//#include <simpleocv.h>
+#include <opencv2/core/core.hpp>
+#include <opencv2/highgui/highgui.hpp>
+#include <opencv2/imgproc/imgproc.hpp>
+#include <stdio.h>
+
+class MWorkMat{
+  private:
+    int     srcw = 168;
+    int     edge = 4;
+    int     adjw = 160;
+
+    int     mskx = 5;
+    int     msky = 5;
+    int     mskw = 150;
+    int     mskh = 145;
+    int     m_boxx;
+    int     m_boxy;
+    int     m_boxwidth;
+    int     m_boxheight;
+    JMat*   m_pic;
+    JMat*   m_msk;
+
+    JMat*   pic_realadjw;//blendimg
+    JMat*   pic_maskadjw;
+
+    cv::Mat matpic_roisrc;//box area
+    cv::Mat matpic_orgsrcw;
+    cv::Mat matpic_roiadjw;
+    JMat*   pic_cloneadjw;//blendimg
+    cv::Mat matpic_roirst;
+
+    //
+    JMat*   msk_realadjw;
+
+    cv::Mat matmsk_roisrc;//box area
+    cv::Mat matmsk_orgsrcw;
+    cv::Mat matmsk_roiadjw;
+
+    cv::Mat matmsk_roirst;
+
+    int vtacc(uint8_t* buf,int count);
+  public:
+    MWorkMat(JMat* pic,JMat* msk,const int* boxs,int kind=168);
+    int premunet();
+    int munet(JMat** ppic,JMat** pmsk);
+    int finmunet(JMat* fgpic=NULL);
+    int prealpha();
+    int alpha(JMat** preal,JMat** pimg,JMat** pmsk);
+    int finalpha();
+
+    virtual ~MWorkMat();
+};
+
--- a/duix-sdk/src/main/cpp/dhunet/munet.cpp
+++ b/duix-sdk/src/main/cpp/dhunet/munet.cpp
@ -0,0 +1,275 @@
+#include "munet.h"
+#include "cpu.h"
+#include "face_utils.h"
+#include "blendgram.h"
+
+Mobunet::Mobunet(const char* fnbin,const char* fnparam,const char* fnmsk,int wenetstep,int rgb){
+  m_rgb = rgb;
+  m_wenetstep = wenetstep;
+    initModel(fnbin,fnparam,fnmsk);
+}
+
+Mobunet::Mobunet(const char* modeldir,const char* modelid,int rgb){
+  m_rgb = rgb;
+    char fnbin[1024];
+    char fnparam[1024];
+    char fnmsk[1024];
+    sprintf(fnbin,"%s/%s.bin",modeldir,modelid);
+    sprintf(fnparam,"%s/%s.param",modeldir,modelid);
+    sprintf(fnmsk,"%s/weight_168u.bin",modeldir);
+    initModel(fnbin,fnparam,fnmsk);
+}
+
+int Mobunet::initModel(const char* binfn,const char* paramfn,const char* mskfn){
+    unet.clear();
+    //ncnn::set_cpu_powersave(2);
+    //ncnn::set_omp_num_threads(2);//ncnn::get_big_cpu_count());
+    //unet.opt = ncnn::Option();
+    unet.opt.use_vulkan_compute = false;
+    unet.opt.num_threads = ncnn::get_big_cpu_count();   // 1
+    //unet.load_param("model/mobileunet_v5_wenet_sim.param");
+    //unet.load_model("model/mobileunet_v5_wenet_sim.bin");
+    unet.load_param(paramfn);
+    unet.load_model(binfn);
+    char* wbuf = NULL;
+    dumpfile((char*)mskfn,&wbuf);
+    printf("===mskfn %s\n",mskfn);
+    mat_weights = new JMat(160,160,(uint8_t*)wbuf,1);
+    mat_weights->forceref(0);
+    mat_weightmin = new JMat(128,128,1);
+    cv::Mat ma = mat_weights->cvmat();
+    cv::Mat mb;
+    cv::resize(ma,mb,cv::Size(128,128));
+    cv::Mat mc = mat_weightmin->cvmat();
+    mb.copyTo(mc);
+    return 0;
+}
+
+Mobunet::~Mobunet(){
+    unet.clear();
+    if(mat_weights){
+        delete mat_weights;
+        mat_weights = nullptr;
+    }
+}
+
+int Mobunet::domodelold(JMat* pic,JMat* msk,JMat* feat){
+    JMat  picall(160*160,2,3,0,1);
+    uint8_t* buf = picall.udata();
+    int width = pic->width();
+    int height = pic->height();
+
+    cv::Mat c1(height,width,CV_8UC3,buf);
+    cv::Mat c2(height,width,CV_8UC3,buf+width*height*3);
+    cv::cvtColor(pic->cvmat(),c1,cv::COLOR_RGB2BGR);
+    cv::cvtColor(msk->cvmat(),c2,cv::COLOR_RGB2BGR);
+    ncnn::Mat inall ;
+      inall = ncnn::Mat::from_pixels(buf, ncnn::Mat::PIXEL_BGR, 160*160, 2);
+    inall.substract_mean_normalize(mean_vals, norm_vals);
+    //inall.reshape(160,160,6);
+    ncnn::Mat inwenet(256,20,1,feat->data());
+    ncnn::Mat outpic;
+    ncnn::Extractor ex = unet.create_extractor();
+    ex.input("face", inall);
+    ex.input("audio", inwenet);
+    ex.extract("output", outpic);
+    float outmean_vals[3] = {-1.0f, -1.0f, -1.0f};
+    float outnorm_vals[3] = { 0.5f,  0.5f,  0.5f};
+    outpic.substract_mean_normalize(outmean_vals, outnorm_vals);
+    ncnn::Mat pakpic;
+    ncnn::convert_packing(outpic,pakpic,3);
+    cv::Mat cvadj(160,160,CV_32FC3,pakpic.data);
+    //dumpfloat((float*)cvadj.data,160*160*3);
+    cv::Mat cvreal;
+    float scale = 255.0f;
+    cvadj.convertTo(cvreal,CV_8UC3,scale);
+    cv::Mat cvmask;
+    cv::cvtColor(cvreal,cvmask,cv::COLOR_RGB2BGR);
+    BlendGramAlpha((uchar*)cvmask.data,(uchar*)mat_weights->data(),(uchar*)pic->data(),160,160);
+    return 0;
+}
+
+int Mobunet::domodel(JMat* pic,JMat* msk,JMat* feat,int rect){
+  int width = pic->width();
+  int height = pic->height();
+    ncnn::Mat inmask = ncnn::Mat::from_pixels(msk->udata(), m_rgb?ncnn::Mat::PIXEL_RGB:ncnn::Mat::PIXEL_BGR2RGB, rect, rect);
+    inmask.substract_mean_normalize(mean_vals, norm_vals);
+    ncnn::Mat inreal = ncnn::Mat::from_pixels(pic->udata(), m_rgb?ncnn::Mat::PIXEL_RGB:ncnn::Mat::PIXEL_BGR2RGB, rect, rect);
+    inreal.substract_mean_normalize(mean_vals, norm_vals);
+    ncnn::Mat inpic(width,height,6);
+    float* buf = (float*)inpic.data;
+    float* pr = (float*)inreal.data;
+    memcpy(buf,pr,inreal.cstep*sizeof(float)*inreal.c);
+    buf+= inpic.cstep*inreal.c;
+    float* pm = (float*)inmask.data;
+    memcpy(buf,pm,inmask.cstep*sizeof(float)*inmask.c);
+    float* pf = (float*)feat->data();
+    if(m_wenetstep==10){
+      pf+= 256*5;
+    }
+    ncnn::Mat inwenet(256,m_wenetstep,1,pf);
+    ncnn::Mat outpic;
+    ncnn::Extractor ex = unet.create_extractor();
+    ex.input("face", inpic);
+    ex.input("audio", inwenet);
+    //printf("===debug ncnn\n");
+    ex.extract("output", outpic);
+    float outmean_vals[3] = {-1.0f, -1.0f, -1.0f};
+    float outnorm_vals[3] = { 127.5f,  127.5f,  127.5f};
+    outpic.substract_mean_normalize(outmean_vals, outnorm_vals);
+    cv::Mat cvout(width,height,CV_8UC3);
+    outpic.to_pixels(cvout.data,m_rgb?ncnn::Mat::PIXEL_RGB:ncnn::Mat::PIXEL_RGB2BGR);
+
+    if(rect==160){
+      BlendGramAlpha((uchar*)cvout.data,(uchar*)mat_weights->data(),(uchar*)pic->data(),width,height);
+    }else{
+      BlendGramAlpha((uchar*)cvout.data,(uchar*)mat_weightmin->data(),(uchar*)pic->data(),width,height);
+    }
+    return 0;
+}
+
+
+int Mobunet::preprocess(JMat* pic,JMat* feat){
+    //pic 168
+    cv::Mat roipic(pic->cvmat(),cv::Rect(4,4,160,160));
+    JMat  picmask(160,160,3,0,1);
+    JMat  picreal(160,160,3,0,1);
+    cv::Mat cvmask = picmask.cvmat();
+    cv::Mat cvreal = picreal.cvmat();
+    roipic.copyTo(cvmask);
+    roipic.copyTo(cvreal);
+    cv::rectangle(cvmask,cv::Rect(5,5,150,145),cv::Scalar(0,0,0),-1);//,cv::LineTypes::FILLED);
+    domodel(&picreal,&picmask,feat);
+    cvreal.copyTo(roipic);
+    return 0;
+}
+
+int Mobunet::fgprocess(JMat* pic,const int* boxs,JMat* feat,JMat* fg){
+    int boxx, boxy ,boxwidth, boxheight ;
+    boxx = boxs[0];boxy=boxs[1];boxwidth=boxs[2]-boxx;boxheight=boxs[3]-boxy;
+    int stride = pic->stride();
+    cv::Mat roisrc(pic->cvmat(),cv::Rect(boxx,boxy,boxwidth,boxheight));
+    cv::Mat cvorig;
+    cv::resize(roisrc , cvorig, cv::Size(168, 168), cv::INTER_AREA);
+    JMat  pic168(168,168,(uint8_t*)cvorig.data);
+    preprocess(&pic168,feat);
+    cv::Mat cvrst;;
+    cv::resize(cvorig , cvrst, cv::Size(boxwidth, boxheight), cv::INTER_AREA);
+    cv::Mat roidst(fg->cvmat(),cv::Rect(boxx,boxy,boxwidth,boxheight));
+    cvrst.copyTo(roidst);
+    return 0;
+}
+
+int Mobunet::process(JMat* pic,const int* boxs,JMat* feat){
+    int boxx, boxy ,boxwidth, boxheight ;
+    boxx = boxs[0];boxy=boxs[1];boxwidth=boxs[2]-boxx;boxheight=boxs[3]-boxy;
+    int stride = pic->stride();
+    cv::Mat roisrc(pic->cvmat(),cv::Rect(boxx,boxy,boxwidth,boxheight));
+    cv::Mat cvorig;
+    cv::resize(roisrc , cvorig, cv::Size(168, 168), cv::INTER_AREA);
+    JMat  pic168(168,168,(uint8_t*)cvorig.data);
+    preprocess(&pic168,feat);
+    cv::Mat cvrst;;
+    cv::resize(cvorig , cvrst, cv::Size(boxwidth, boxheight), cv::INTER_AREA);
+    cvrst.copyTo(roisrc);
+    return 0;
+}
+
+int Mobunet::process2(JMat* pic,const int* boxs,JMat* feat){
+    int boxx, boxy ,boxwidth, boxheight ;
+    boxx = boxs[0];boxy=boxs[1];boxwidth=boxs[2]-boxx;boxheight=boxs[3]-boxy;
+    int stride = pic->stride();
+
+    cv::Mat cvsrc = pic->cvmat();
+    printf("cvsrc %d %d \n",cvsrc.cols,cvsrc.rows);
+    cv::Mat roisrc(cvsrc,cv::Rect(boxx,boxy,boxwidth,boxheight));
+    cv::Mat cvorig;
+    cv::resize(roisrc , cvorig, cv::Size(168, 168), cv::INTER_AREA);
+    /*
+    uint8_t* data =(uint8_t*)pic->data() + boxy*stride + boxx*pic->channel();
+    int scale_w = 168;
+    int scale_h = 168;
+    ncnn::Mat prepic = ncnn::Mat::from_pixels_resize(data, ncnn::Mat::PIXEL_BGR, boxwidth, boxheight, stride,scale_w, scale_h);
+    //pic 168
+    cv::Mat cvorig(168,168,CV_8UC3,prepic.data);
+     */
+
+    cv::Mat roimask(cvorig,cv::Rect(4,4,160,160));
+    JMat  picmask(160,160,3,0,1);
+    JMat  picreal(160,160,3,0,1);
+    cv::Mat cvmask = picmask.cvmat();
+    cv::Mat cvreal = picreal.cvmat();
+    roimask.copyTo(cvmask);
+    roimask.copyTo(cvreal);
+
+    cv::rectangle(cvmask,cv::Rect(5,5,150,150),cv::Scalar(0,0,0),-1);//,cv::LineTypes::FILLED);
+
+    ncnn::Mat inmask = ncnn::Mat::from_pixels(picmask.udata(), ncnn::Mat::PIXEL_BGR2RGB, 160, 160);
+    inmask.substract_mean_normalize(mean_vals, norm_vals);
+    ncnn::Mat inreal = ncnn::Mat::from_pixels(picreal.udata(), ncnn::Mat::PIXEL_BGR2RGB, 160, 160);
+    inreal.substract_mean_normalize(mean_vals, norm_vals);
+
+    JMat  picin(160*160,2,3);
+    char*  pd = (char*)picin.data();
+    memcpy(pd,inreal.data,160*160*3*4);
+    memcpy(pd+ 160*160*3*4,inmask.data,160*160*3*4);
+
+//    char* pinpic = NULL;
+//    dumpfile("pic.bin",&pinpic);
+//    dumpfloat((float*)pd,10);
+//    dumpfloat((float*)pinpic,10);
+    //ncnn::Mat inpic(160,160,6,pd,4);
+    ncnn::Mat inpack(160,160,1,pd,(size_t)4u*6,6);
+    ncnn::Mat inpic;
+    ncnn::convert_packing(inpack,inpic,1);
+
+//    char* pwenet = NULL;
+//    dumpfile("wenet.bin",&pwenet);
+    ncnn::Mat inwenet(256,20,1,feat->data(),4);
+    ncnn::Mat outpic;
+    ncnn::Extractor ex = unet.create_extractor();
+    ex.input("face", inpic);
+    ex.input("audio", inwenet);
+    ex.extract("output", outpic);
+
+    float outmean_vals[3] = {-1.0f, -1.0f, -1.0f};
+//    float outnorm_vals[3] = { 2.0f,  2.0f,  2.0f};
+    float outnorm_vals[3] = { 127.5f,  127.5f,  127.5f};
+    outpic.substract_mean_normalize(outmean_vals, outnorm_vals);
+
+    ncnn::Mat pakpic;
+    ncnn::convert_packing(outpic,pakpic,3);
+
+    cv::Mat cvadj(160,160,CV_32FC3,pakpic.data);
+    cv::Mat cvout(160,160,CV_8UC3);
+    float scale = 1.0f;
+    cvadj.convertTo(cvout,CV_8UC3,scale);
+    //cv::imwrite("cvout.jpg",cvout);
+    cv::cvtColor(cvout,roimask,cv::COLOR_RGB2BGR);
+//    cvout.copyTo(roimask);
+
+    //cv::imwrite("roimask.jpg",roimask);
+    //cv::imwrite("cvorig.jpg",cvorig);
+    //cv::waitKey(0);
+    cv::resize(cvorig , roisrc, cv::Size(boxwidth, boxheight), cv::INTER_AREA);
+    //cv::imwrite("roisrc.jpg",roisrc);
+        //cv::imshow("cvsrc",cvsrc);
+//    cv::imshow("roisrc",roisrc);
+//    cv::imshow("cvorig",cvorig);
+//    cv::waitKey(20);
+    /*
+    {
+        uint8_t *pr = (uint8_t *) cvoutc.data;
+        printf("==%u %u %u\n", pr[0], pr[1], pr[2]);
+    }
+    //
+    float* p = (float*)cvadj.data;
+    printf("==%f %f %f\n",p[0],p[1],p[2]);
+    p+=160*160;
+    printf("==%f %f %f\n",p[0],p[1],p[2]);
+    p+=160*160;
+    printf("==%f %f %f\n",p[0],p[1],p[2]);
+    */
+    return 0;
+}
+
--- a/duix-sdk/src/main/cpp/dhunet/munet.h
+++ b/duix-sdk/src/main/cpp/dhunet/munet.h
@ -0,0 +1,31 @@
+#pragma once
+#include "jmat.h"
+#include "net.h"
+#include <opencv2/core/core.hpp>
+#include <opencv2/highgui/highgui.hpp>
+#include <opencv2/imgproc/imgproc.hpp>
+#include <stdio.h>
+#include <vector>
+
+
+class Mobunet{
+    private:
+      int m_wenetstep = 20;
+      int m_rgb =0;
+        ncnn::Net unet;
+        float mean_vals[3] = {127.5f, 127.5f, 127.5f};
+        float norm_vals[3] = {1 / 127.5f, 1 / 127.5f, 1 / 127.5f};
+        JMat*   mat_weights = nullptr;
+        JMat*   mat_weightmin = nullptr;
+        int initModel(const char* binfn,const char* paramfn,const char* mskfn);
+    public:
+        int domodel(JMat* pic,JMat* msk,JMat* feat,int rect = 160);
+        int domodelold(JMat* pic,JMat* msk,JMat* feat);
+        int preprocess(JMat* pic,JMat* feat);
+        int process(JMat* pic,const int* boxs,JMat* feat);
+        int fgprocess(JMat* pic,const int* boxs,JMat* feat,JMat* fg);
+        int process2(JMat* pic,const int* boxs,JMat* feat);
+        Mobunet(const char* modeldir,const char* modelid,int rgb = 0);
+        Mobunet(const char* fnbin,const char* fnparam,const char* fnmsk,int wenetstep = 20,int rgb = 0);
+        ~Mobunet();
+};
--- a/duix-sdk/src/main/cpp/duix/gjduix.cpp
+++ b/duix-sdk/src/main/cpp/duix/gjduix.cpp
@ -0,0 +1,290 @@
+#include <stdlib.h>
+#include <pthread.h>
+#include "gjduix.h"
+#include "dhwenet.h"
+#include "wenetai.h"
+#include "dhpcm.h"
+#include "munet.h"
+#include "malpha.h"
+#include "dhwenet.h"
+
+struct dhmfcc_s{
+  int mincalc;
+  int minoff;  
+  int minblock;  
+  int maxblock;  
+  int inited;
+  char* wenetfn;
+
+  //DhWenet* wenet;
+  WeAI*   weai_first;
+  WeAI*   weai_common;
+  PcmSession* cursess;
+  PcmSession* presess;
+  volatile uint64_t  sessid;
+
+  volatile int running;
+  pthread_t *calcthread;
+  pthread_mutex_t pushmutex;
+  pthread_mutex_t readmutex;
+};
+
+
+static void *calcworker(void *arg){
+  dhmfcc_t* mfcc = (dhmfcc_t*)arg;
+  uint64_t sessid = 0;
+  while(mfcc->running){
+    int rst = 0;
+    PcmSession* sess = mfcc->cursess;
+    if(sess &&(sess->sessid()==mfcc->sessid)){
+      rst = sess->runcalc(mfcc->sessid,mfcc->weai_common,mfcc->mincalc);
+    }
+    if(rst!=1){
+      jtimer_mssleep(20);
+    }else{
+      jtimer_mssleep(10);
+    }
+  }
+  return NULL;
+}
+
+int dhmfcc_alloc(dhmfcc_t** pdg,int mincalc){
+  dhmfcc_t* mfcc = (dhmfcc_t*)malloc(sizeof(dhmfcc_t));
+  memset(mfcc,0,sizeof(dhmfcc_t));
+  mfcc->mincalc = mincalc?mincalc:1;
+  mfcc->minoff = STREAM_BASE_MINOFF;
+  mfcc->minblock = STREAM_BASE_MINBLOCK;
+  mfcc->maxblock = STREAM_BASE_MAXBLOCK;
+  pthread_mutex_init(&mfcc->pushmutex,NULL);
+  pthread_mutex_init(&mfcc->readmutex,NULL);
+  mfcc->calcthread = (pthread_t *)malloc(sizeof(pthread_t) );
+  mfcc->running = 1;
+  pthread_create(mfcc->calcthread, NULL, calcworker, (void*)mfcc);
+  *pdg = mfcc;
+  return 0;
+}
+
+int dhmfcc_initPcmex(dhmfcc_t* dg,int maxsize,int minoff ,int minblock ,int maxblock){
+  dg->minoff = minoff;
+  dg->minblock = minblock;
+  dg->maxblock = maxblock;
+  dg->inited = 1;
+#ifdef WENETOPENV
+  if(dg->wenetfn){
+    //
+    std::string fnonnx(dg->wenetfn);
+    std::string fnovbin = fnonnx+"_ov.bin";
+    std::string fnovxml = fnonnx+"_ov.xml";
+    int melcnt = DhWenet::cntmel(dg->minblock);
+    int bnfcnt = DhWenet::cntbnf(melcnt);
+    WeAI*  awenet ;
+    awenet = new WeOpvn(fnovbin,fnovxml,melcnt,bnfcnt,4);
+    if(dg->weai_first){
+      WeAI* oldw = dg->weai_first;
+      dg->weai_first = awenet;
+      delete oldw;
+    }else{
+      dg->weai_first = awenet;
+    }
+    awenet->test();
+  }
+#endif
+  return 0;
+}
+
+int dhmfcc_initWenet(dhmfcc_t* dg,char* fnwenet){
+  dg->wenetfn = strdup(fnwenet);
+
+  std::string fnonnx(fnwenet);
+  WeAI*  awenet ;
+    int melcnt = DhWenet::cntmel(dg->minblock);
+    int bnfcnt = DhWenet::cntbnf(melcnt);
+#ifdef WENETOPENV
+  if(dg->inited){
+    std::string fnovbin = fnonnx+"_ov.bin";
+    std::string fnovxml = fnonnx+"_ov.xml";
+    awenet = new WeOpvn(fnovbin,fnovxml,melcnt,bnfcnt,4);
+  }else{
+    awenet = new WeOnnx(fnwenet,melcnt,bnfcnt,4);
+  }
+#else
+    awenet = new WeOnnx(fnwenet,melcnt,bnfcnt,4);
+#endif
+  WeAI* bwenet = new WeOnnx(fnwenet,321,79,4);
+  if(dg->weai_first){
+    WeAI* oldw = dg->weai_first;
+    dg->weai_first = awenet;
+    delete oldw;
+  }else{
+    dg->weai_first = awenet;
+  }
+  if(dg->weai_common){
+    WeAI* oldw = dg->weai_common;
+    dg->weai_common = bwenet;
+    delete oldw;
+  }else{
+    dg->weai_common = bwenet;
+  }
+  awenet->test();
+  bwenet->test();
+  return awenet?0:-1;
+}
+
+uint64_t dhmfcc_newsession(dhmfcc_t* dg){
+  uint64_t sessid = ++dg->sessid;
+  PcmSession* sess = new PcmSession(sessid,dg->minoff,dg->minblock,dg->maxblock);
+  PcmSession* olds = dg->presess;
+  dg->presess = dg->cursess;
+  dg->cursess = sess;
+  if(olds)delete olds;
+  return sessid;
+}
+
+int dhmfcc_pushpcm(dhmfcc_t* dg,uint64_t sessid,char* buf,int size,int kind){
+  if(sessid!=dg->sessid)return -1;
+  if(!dg->running)return -2;
+  PcmSession* sess = dg->cursess;
+  if(!sess)return -3;
+  int rst =  0;
+  pthread_mutex_lock(&dg->pushmutex);
+  rst = sess->pushpcm(sessid,(uint8_t*)buf,size);
+  pthread_mutex_unlock(&dg->pushmutex);
+  if(rst>0){
+    if(sess->first()){
+      sess->runfirst(sessid,dg->weai_first);
+      uint64_t tick = jtimer_msstamp();
+      printf("====runfirst  %ld %ld \n",sessid,tick);
+    }
+    return 0;
+  }else{
+    return rst;
+  }
+}
+
+int dhmfcc_readpcm(dhmfcc_t* dg,uint64_t sessid,char* pcmbuf,int pcmlen,char* bnfbuf,int bnflen){
+  if(sessid!=dg->sessid)return -1;
+  if(!dg->running)return -2;
+  PcmSession* sess = dg->cursess;
+  if(!sess)return -3;
+  int rst = 0;
+  pthread_mutex_lock(&dg->readmutex);
+  rst =  sess->readnext(sessid,(uint8_t*)pcmbuf,pcmlen,(uint8_t*)bnfbuf,bnflen);
+  pthread_mutex_unlock(&dg->readmutex);
+  return rst;
+}
+
+int dhmfcc_consession(dhmfcc_t* dg,uint64_t sessid){
+  if(sessid!=dg->sessid)return -1;
+  if(!dg->running)return -2;
+  PcmSession* sess = dg->cursess;
+  if(!sess)return -3;
+  return sess->conpcm(sessid);
+}
+
+int dhmfcc_finsession(dhmfcc_t* dg,uint64_t sessid){
+  if(sessid!=dg->sessid)return -1;
+  if(!dg->running)return -2;
+  PcmSession* sess = dg->cursess;
+  if(!sess)return -3;
+  return sess->finpcm(sessid);
+}
+
+int dhmfcc_free(dhmfcc_t* dg){
+  dg->running = 0;
+  pthread_join(*dg->calcthread, NULL);
+  if(dg->weai_first){
+    delete dg->weai_first;
+    dg->weai_first = NULL;
+  }
+  if(dg->weai_common){
+    delete dg->weai_common;
+    dg->weai_common = NULL;
+  }
+  if(dg->cursess){
+    delete dg->cursess;
+    dg->cursess = NULL;
+  }
+  if(dg->presess){
+    delete dg->presess;
+    dg->presess = NULL;
+  }
+  pthread_mutex_destroy(&dg->pushmutex);
+  pthread_mutex_destroy(&dg->readmutex);
+  free(dg->calcthread);
+  free(dg);
+  //
+  return 0;
+}
+
+struct dhunet_s{
+  int inited;
+  int rgb;
+  Mobunet     *munet; 
+};
+
+int dhunet_alloc(dhunet_t** pdg,int rgb){
+  dhunet_t* unet = (dhunet_t*)malloc(sizeof(dhunet_t));
+  memset(unet,0,sizeof(dhunet_t));
+  unet->rgb = 1;
+  *pdg = unet;
+  return 0;
+}
+
+int dhunet_initMunet(dhunet_t* dg,char* fnparam,char* fnbin,char* fnmsk){
+  dg->munet = new Mobunet(fnbin,fnparam,fnmsk,20,dg->rgb);
+  dg->inited = 1;
+  printf("===init munet \n");
+  return 0;
+}
+
+#define AIRUN_FLAG 1
+int dhunet_simprst(dhunet_t* dg,uint64_t sessid,uint8_t* bpic,int width,int height,int* box,uint8_t* bmsk,uint8_t* bfg,uint8_t* bnfbuf,int bnflen){
+  //printf("simprst gogogo %d \n",dg->inited);
+  if(!dg->inited)return -1;
+  if(bnflen!=STREAM_ALL_BNF)return -2;
+  if(!dg->munet)return -3;
+  int rst = 0;
+  JMat* mat_pic = new JMat(width,height,bpic);
+  JMat* mat_msk = bmsk?new JMat(width,height,bmsk):NULL;
+  JMat* mat_fg = bfg?new JMat(width,height,bfg):NULL;
+  JMat* feat = new JMat(STREAM_CNT_BNF,STREAM_BASE_BNF,(float*)bnfbuf,1);
+
+  MWorkMat wmat(mat_pic,mat_msk,box);
+  wmat.premunet();
+  JMat* mpic;
+  JMat* mmsk;
+  wmat.munet(&mpic,&mmsk);
+  //tooken
+#ifdef AIRUN_FLAG
+  uint64_t ticka = jtimer_msstamp();
+  rst = dg->munet->domodel(mpic, mmsk, feat);
+  uint64_t tickb = jtimer_msstamp();
+  uint64_t dist = tickb-ticka;
+  if(dist>40){
+    printf("===domodel %d dist %ld\n",rst,dist);
+  }
+#endif
+  if(mat_fg){
+    wmat.finmunet(mat_fg);
+  }else{
+    wmat.finmunet(mat_pic);
+  }
+  if(feat)delete feat;
+  delete mat_pic;
+  if(mat_fg)delete mat_fg;
+  if(mat_msk)delete mat_msk;
+  return 0;
+}
+
+int dhunet_free(dhunet_t* dg){
+  dg->inited = 0;
+  if(dg->munet){
+    delete dg->munet;
+    dg->munet = NULL;
+  }
+  free(dg);
+  return 0;
+
+}
+
+
--- a/duix-sdk/src/main/cpp/duix/gjduix.h
+++ b/duix-sdk/src/main/cpp/duix/gjduix.h
@ -0,0 +1,38 @@
+#ifndef GJDUIX_
+#define GJDUIX_
+
+#include <stdint.h>
+
+#ifdef __cplusplus
+extern "C"{
+#endif
+
+typedef struct dhmfcc_s dhmfcc_t;;
+
+int dhmfcc_alloc(dhmfcc_t** pdg,int mincalc);
+int dhmfcc_initPcmex(dhmfcc_t* dg,int maxsize,int minoff ,int minblock ,int maxblock);
+int dhmfcc_initWenet(dhmfcc_t* dg,char* fnwenet); 
+
+uint64_t dhmfcc_newsession(dhmfcc_t* dg);
+int dhmfcc_pushpcm(dhmfcc_t* dg,uint64_t sessid,char* buf,int size,int kind);
+int dhmfcc_readpcm(dhmfcc_t* dg,uint64_t sessid,char* pcmbuf,int pcmlen,char* bnfbuf,int bnflen);
+int dhmfcc_finsession(dhmfcc_t* dg,uint64_t sessid);
+int dhmfcc_consession(dhmfcc_t* dg,uint64_t sessid);
+
+int dhmfcc_free(dhmfcc_t* dg);
+
+
+typedef struct dhunet_s dhunet_t;;
+int dhunet_alloc(dhunet_t** pdg,int minrender);
+int dhunet_initMunet(dhunet_t* dg,char* fnparam,char* fnbin,char* fnmsk);
+int dhunet_simprst(dhunet_t* dg,uint64_t sessid,uint8_t* bpic,int width,int height,int* box,uint8_t* bmsk,uint8_t* bfg,uint8_t* bnfbuf,int bnflen);
+int dhunet_free(dhunet_t* pdg);
+
+
+
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
--- a/duix-sdk/src/main/cpp/duix/gjsimp.cpp
+++ b/duix-sdk/src/main/cpp/duix/gjsimp.cpp
@ -0,0 +1,453 @@
+#include "gjsimp.h"
+#include <stdlib.h>
+#include <pthread.h>
+#include "dhwenet.h"
+#include "wenetai.h"
+#include "dhpcm.h"
+#include "munet.h"
+#include "malpha.h"
+#include "dhwenet.h"
+#include <queue>
+//#include "Log.h"
+
+
+struct dhduix_s{
+  int kind;
+  int rect;
+  int width;
+  int height;
+  int mincalc;
+  int minoff;  
+  int minblock;  
+  int maxblock;  
+  int inited;
+  char* wenetfn;
+
+  //DhWenet* wenet;
+  WeAI*   weai_first;
+  WeAI*   weai_common;
+  PcmSession* cursess;
+  //PcmSession* presess;
+  volatile uint64_t  sessid;
+
+  jmat_t    *mat_feat;
+  volatile int running;
+  pthread_t *calcthread;
+  pthread_mutex_t pushmutex;
+  pthread_mutex_t readmutex;
+  pthread_mutex_t freemutex;
+  std::queue<PcmSession*> *slist;  
+
+  int rgb;
+  Mobunet     *munet; 
+  JMat        *mat_pic;
+  JMat        *mat_fg;
+  JMat        *mat_msk;
+};
+
+static void *calcworker(void *arg){
+  dhduix_t* mfcc = (dhduix_t*)arg;
+  uint64_t sessid = 0;
+  while(mfcc->running){
+    int rst = 0;
+    PcmSession* sess = mfcc->cursess;
+    if(sess &&(sess->sessid()==mfcc->sessid)){
+      rst = sess->runcalc(mfcc->sessid,mfcc->weai_common,mfcc->mincalc);
+    }
+    if(rst!=1){
+      if(!mfcc->slist->empty()){
+        pthread_mutex_lock(&mfcc->freemutex);
+        PcmSession* sess = mfcc->slist->front();
+        mfcc->slist->pop();
+        delete sess;
+        pthread_mutex_unlock(&mfcc->freemutex);
+        jtimer_mssleep(10);
+      }else{
+        jtimer_mssleep(20);
+      }
+    }else{
+      jtimer_mssleep(10);
+    }
+  }
+  return NULL;
+}
+
+int dhduix_alloc(dhduix_t** pdg,int mincalc,int width,int height){
+  dhduix_t* duix = (dhduix_t*)malloc(sizeof(dhduix_t));
+  memset(duix,0,sizeof(dhduix_t));
+  duix->mincalc = mincalc?mincalc:1;
+  duix->minoff = STREAM_BASE_MINOFF;
+  duix->minblock = STREAM_BASE_MINBLOCK;
+  duix->maxblock = STREAM_BASE_MAXBLOCK;
+  pthread_mutex_init(&duix->pushmutex,NULL);
+  pthread_mutex_init(&duix->readmutex,NULL);
+  pthread_mutex_init(&duix->freemutex,NULL);
+  duix->slist = new std::queue<PcmSession*>();
+  duix->calcthread = (pthread_t *)malloc(sizeof(pthread_t) );
+  duix->running = 1;
+  pthread_create(duix->calcthread, NULL, calcworker, (void*)duix);
+  duix->rgb = 1;
+  duix->width = width;
+  duix->height = height;
+  duix->mat_msk = new JMat(width,height);
+  duix->mat_fg = new JMat(width,height);
+  duix->mat_pic = new JMat(width,height);
+  //duix->mat_feat = jmat_alloc(20,STREAM_BASE_BNF,1,0,4,NULL);
+  duix->mat_feat = jmat_alloc(STREAM_BASE_BNF,20,1,0,4,NULL);
+  duix->kind = 168;
+  duix->rect = 160;
+  *pdg = duix;
+  return 0;
+}
+
+int dhduix_initPcmex(dhduix_t* dg,int maxsize,int minoff ,int minblock ,int maxblock,int rgb){
+  dg->minoff = minoff;
+  dg->minblock = minblock;
+  dg->maxblock = maxblock;
+  dg->inited = 1;
+#ifdef WENETOPENV
+  if(dg->wenetfn){
+    //
+    std::string fnonnx(dg->wenetfn);
+    std::string fnovbin = fnonnx+"_ov.bin";
+    std::string fnovxml = fnonnx+"_ov.xml";
+    int melcnt = DhWenet::cntmel(dg->minblock);
+    int bnfcnt = DhWenet::cntbnf(melcnt);
+    WeAI*  awenet ;
+    awenet = new WeOpvn(fnovbin,fnovxml,melcnt,bnfcnt,4);
+    if(dg->weai_first){
+      WeAI* oldw = dg->weai_first;
+      dg->weai_first = awenet;
+      delete oldw;
+    }else{
+      dg->weai_first = awenet;
+    }
+    awenet->test();
+  }
+#endif
+  dg->rgb = rgb;
+  return 0;
+}
+
+int dhduix_initWenet(dhduix_t* dg,char* fnwenet){
+  dg->wenetfn = strdup(fnwenet);
+
+  std::string fnonnx(fnwenet);
+  WeAI*  awenet ;
+  int melcnt = DhWenet::cntmel(dg->minblock);
+  int bnfcnt = DhWenet::cntbnf(melcnt);
+#ifdef WENETOPENV
+  if(dg->inited){
+    std::string fnovbin = fnonnx+"_ov.bin";
+    std::string fnovxml = fnonnx+"_ov.xml";
+    awenet = new WeOpvn(fnovbin,fnovxml,melcnt,bnfcnt,4);
+  }else{
+    awenet = new WeOnnx(fnwenet,melcnt,bnfcnt,4);
+  }
+#else
+  awenet = new WeOnnx(fnwenet,melcnt,bnfcnt,4);
+#endif
+  WeAI* bwenet = new WeOnnx(fnwenet,321,79,4);
+  if(dg->weai_first){
+    WeAI* oldw = dg->weai_first;
+    dg->weai_first = awenet;
+    delete oldw;
+  }else{
+    dg->weai_first = awenet;
+  }
+  if(dg->weai_common){
+    WeAI* oldw = dg->weai_common;
+    dg->weai_common = bwenet;
+    delete oldw;
+  }else{
+    dg->weai_common = bwenet;
+  }
+  awenet->test();
+  bwenet->test();
+  return awenet?0:-1;
+}
+
+uint64_t dhduix_newsession(dhduix_t* dg){
+  uint64_t sessid = ++dg->sessid;
+  PcmSession* sess = new PcmSession(sessid,dg->minoff,dg->minblock,dg->maxblock);
+  //PcmSession* olds = dg->presess;
+  //dg->presess = dg->cursess;
+  //dg->cursess = sess;
+  //if(olds)delete olds;
+  pthread_mutex_lock(&dg->pushmutex);
+  pthread_mutex_lock(&dg->readmutex);
+  PcmSession* olds = dg->cursess;
+  dg->cursess = sess;
+  pthread_mutex_unlock(&dg->pushmutex);
+  pthread_mutex_unlock(&dg->readmutex);
+  pthread_mutex_lock(&dg->freemutex);
+  dg->slist->push(olds);
+  pthread_mutex_unlock(&dg->freemutex);
+  return sessid;
+}
+
+int dhduix_pushpcm(dhduix_t* dg,uint64_t sessid,char* buf,int size,int kind){
+  if(sessid!=dg->sessid)return -1;
+  if(!dg->running)return -2;
+  PcmSession* sess = dg->cursess;
+  if(!sess)return -3;
+  int rst =  0;
+  pthread_mutex_lock(&dg->pushmutex);
+  rst = sess->pushpcm(sessid,(uint8_t*)buf,size);
+  pthread_mutex_unlock(&dg->pushmutex);
+  if(rst>0){
+    if(sess->first()){
+      sess->runfirst(sessid,dg->weai_first);
+      uint64_t tick = jtimer_msstamp();
+      printf("====runfirst  %ld %ld \n",sessid,tick);
+    }
+    return 0;
+  }else{
+    return rst;
+  }
+}
+
+int dhduix_readpcm(dhduix_t* dg,uint64_t sessid,char* pcmbuf,int pcmlen,char* bnfbuf,int bnflen){
+  if(sessid!=dg->sessid)return -1;
+  if(!dg->running)return -2;
+  PcmSession* sess = dg->cursess;
+  if(!sess)return -3;
+  int rst = 0;
+  pthread_mutex_lock(&dg->readmutex);
+  rst =  sess->readnext(sessid,(uint8_t*)pcmbuf,pcmlen,(uint8_t*)bnfbuf,bnflen);
+  pthread_mutex_unlock(&dg->readmutex);
+  return rst;
+}
+
+int dhduix_consession(dhduix_t* dg,uint64_t sessid){
+  if(sessid!=dg->sessid)return -1;
+  if(!dg->running)return -2;
+  PcmSession* sess = dg->cursess;
+  if(!sess)return -3;
+  return sess->conpcm(sessid);
+}
+
+int dhduix_finsession(dhduix_t* dg,uint64_t sessid){
+  if(sessid!=dg->sessid)return -1;
+  if(!dg->running)return -2;
+  PcmSession* sess = dg->cursess;
+  if(!sess)return -3;
+  return sess->finpcm(sessid);
+}
+
+int dhduix_free(dhduix_t* dg){
+  dg->running = 0;
+  pthread_join(*dg->calcthread, NULL);
+  if(dg->slist){
+    pthread_mutex_lock(&dg->freemutex);
+    while(!dg->slist->empty()){
+      PcmSession* sess = dg->slist->front();
+      dg->slist->pop();
+      delete sess;
+    }
+    pthread_mutex_unlock(&dg->freemutex);
+    delete dg->slist;
+  }
+
+  if(dg->weai_first){
+    delete dg->weai_first;
+    dg->weai_first = NULL;
+  }
+  if(dg->weai_common){
+    delete dg->weai_common;
+    dg->weai_common = NULL;
+  }
+  if(dg->cursess){
+    delete dg->cursess;
+    dg->cursess = NULL;
+  }
+  //if(dg->presess){
+    //delete dg->presess;
+    //dg->presess = NULL;
+  //}
+  if(dg->munet){
+    delete dg->munet;
+    dg->munet = NULL;
+  }
+  if(dg->mat_fg){
+    delete dg->mat_fg;
+    dg->mat_fg = NULL;
+  }
+  if(dg->mat_pic){
+    delete dg->mat_pic;
+    dg->mat_pic = NULL;
+  }
+  if(dg->mat_msk){
+    delete dg->mat_msk;
+    dg->mat_msk = NULL;
+  }
+  pthread_mutex_destroy(&dg->pushmutex);
+  pthread_mutex_destroy(&dg->readmutex);
+  pthread_mutex_destroy(&dg->freemutex);
+  free(dg->calcthread);
+  jmat_free(dg->mat_feat);
+  free(dg);
+  //
+  return 0;
+}
+
+
+int dhduix_initMunet(dhduix_t* dg,char* fnparam,char* fnbin,char* fnmsk){
+  dg->munet = new Mobunet(fnbin,fnparam,fnmsk,20,dg->rgb);
+  dg->inited = 1;
+  printf("===init munet \n");
+  dg->kind = 168;
+  dg->rect = 160;
+  return 0;
+}
+
+int dhduix_initMunetex(dhduix_t* dg,char* fnparam,char* fnbin,char* fnmsk,int rect){
+  dg->munet = new Mobunet(fnbin,fnparam,fnmsk,20,dg->rgb);
+  dg->inited = 1;
+  if(rect==128){
+    dg->kind = 128;
+    dg->rect = 128;
+  }else{
+    dg->kind = 168;
+    dg->rect = 160;
+  }
+  printf("===init munet \n");
+  return 0;
+}
+
+int dhduix_simppcm(dhduix_t* dg,char* buf,int size,char* pre,int presize,char* bnf,int bnfsize){
+  if(!dg->running)return -2;
+  PcmFile* mfcc = new PcmFile(25,10,STREAM_BASE_MAXBLOCK,STREAM_BASE_MAXBLOCK*20);
+  mfcc->prepare(buf,size,pre,presize);
+  mfcc->process(-1,dg->weai_first);
+  int rst = mfcc->readbnf(buf,size);
+
+  return rst;
+}
+
+int dhduix_allcnt(dhduix_t* dg,uint64_t sessid){
+  PcmSession* sess = dg->cursess;
+  if(!sess)return -3;
+  if(sess->sessid()!=sessid)return 0;
+  return sess->fileBlock();
+}
+
+int dhduix_readycnt(dhduix_t* dg,uint64_t sessid){
+  PcmSession* sess = dg->cursess;
+  if(!sess)return -3;
+  if(sess->sessid()!=sessid)return 0;
+  return sess->calcBlock();
+}
+
+
+#define AIRUN_FLAG 1
+int dhduix_fileinx(dhduix_t* dg,uint64_t sessid,char* fnpic,int* box,char* fnmsk,char* fnfg,int bnfinx,char* bimg,char* mskbuf,int imgsize){
+  if(sessid!=dg->sessid)return -1;
+  if(!dg->running)return -2;
+
+  uint64_t ticka = jtimer_msstamp();
+  std::string sfnpic(fnpic);
+  std::string sfnmsk(fnmsk);
+  std::string sfnfg(fnfg);
+  JMat* mat_pic = dg->mat_pic;
+  mat_pic->loadjpg(sfnpic,1);
+  uint8_t* bpic = (uint8_t*)mat_pic->data();
+  uint8_t* bmsk = NULL;
+  uint8_t* bfg = NULL;
+  JMat* mat_msk = NULL;
+  if(sfnmsk.length()){
+    mat_msk = dg->mat_msk;
+    mat_msk->loadjpg(sfnmsk,1);
+    bmsk = (uint8_t*)mat_msk->data();
+    memcpy(mskbuf,bmsk,dg->width*dg->height*3);
+  }
+  JMat* mat_fg = NULL;
+  if(sfnfg.length()){
+    mat_fg = dg->mat_fg;
+    mat_fg->loadjpg(sfnfg,1);
+    bfg = (uint8_t*)mat_fg->data();
+  }
+  uint64_t tickb = jtimer_msstamp();
+  uint64_t dist = tickb-ticka;
+  //LOGD("tooken","===loadjpg %ld\n",dist);
+  int rst = 0;
+  if(box){
+    rst = dhduix_simpinx(dg,sessid, bpic,dg->width,dg->height, box, bmsk, bfg,bnfinx);
+  }else{
+    rst = dhduix_simpblend(dg,sessid, bpic,dg->width,dg->height,  bmsk, bfg);
+  }
+  int size = dg->width*dg->height*3;
+  if(bfg){
+    memcpy(bimg,bfg,size);
+  }else{
+    memcpy(bimg,bpic,size);
+  }
+  if(bmsk) memcpy(mskbuf,bmsk,size);
+  return rst;
+}
+
+int dhduix_simpinx(dhduix_t* dg,uint64_t sessid,uint8_t* bpic,int width,int height,int* box,uint8_t* bmsk,uint8_t* bfg,int inx){
+  if(sessid!=dg->sessid)return -1;
+  if(!dg->running)return -2;
+  PcmSession* sess = dg->cursess;
+  if(!sess)return -3;
+  int rst = 0;
+  int w = width?width:dg->width;
+  int h = height?height:dg->height;
+  pthread_mutex_lock(&dg->readmutex);
+  rst =  sess->readblock(sessid,dg->mat_feat,inx);
+  pthread_mutex_unlock(&dg->readmutex);
+  //printf("===readblock %d\n",rst);
+  if(rst>0){
+    rst = dhduix_simprst(dg,sessid, bpic,w,h, box, bmsk, bfg,(uint8_t*)dg->mat_feat->data,STREAM_ALL_BNF);
+    return 1;
+  }
+  return rst;
+}
+
+int dhduix_simpblend(dhduix_t* dg,uint64_t sessid,uint8_t* bpic,int width,int height,uint8_t* bmsk,uint8_t* bfg){
+  //
+  return 0;
+}
+
+int dhduix_simprst(dhduix_t* dg,uint64_t sessid,uint8_t* bpic,int width,int height,int* box,uint8_t* bmsk,uint8_t* bfg,uint8_t* bnfbuf,int bnflen){
+  //printf("simprst gogogo %d \n",dg->inited);
+  if(!dg->inited)return -1;
+  if(!dg->munet)return -3;
+  int rst = 0;
+  JMat* mat_pic = new JMat(width,height,bpic);
+  JMat* mat_msk = bmsk?new JMat(width,height,bmsk):NULL;
+  JMat* mat_fg = bfg?new JMat(width,height,bfg):NULL;
+  //read pcm
+  JMat* feat = new JMat(STREAM_CNT_BNF,STREAM_BASE_BNF,(float*)bnfbuf,1);
+
+//    MWorkMat wmat(mat_pic,mat_msk,box);
+  MWorkMat wmat(mat_pic, NULL,box,dg->kind);
+  wmat.premunet();
+  JMat* mpic;
+  JMat* mmsk;
+  wmat.munet(&mpic,&mmsk);
+  //tooken
+#ifdef AIRUN_FLAG
+  uint64_t ticka = jtimer_msstamp();
+  rst = dg->munet->domodel(mpic, mmsk, feat,dg->rect);
+  uint64_t tickb = jtimer_msstamp();
+  uint64_t dist = tickb-ticka;
+  //LOGD("tooken","===domodel %ld\n",dist);
+  if(dist>40){
+    printf("===domodel %d dist %ld\n",rst,dist);
+  }
+#endif
+  if(mat_fg){
+    wmat.finmunet(mat_fg);
+  }else{
+    wmat.finmunet(mat_pic);
+  }
+  if(feat)delete feat;
+  delete mat_pic;
+  if(mat_fg)delete mat_fg;
+  if(mat_msk)delete mat_msk;
+  return 0;
+}
+
+
--- a/duix-sdk/src/main/cpp/include/aicommon.h
+++ b/duix-sdk/src/main/cpp/include/aicommon.h
@ -0,0 +1,39 @@
+#pragma once
+
+//#define MFCC_OFFSET  6436
+#define MFCC_OFFSET  6400
+//##define MFCC_OFFSET  0
+#define MFCC_DEFRMS  0.1f
+#define MFCC_FPS    25
+#define MFCC_RATE   16000
+//#define MFCC_WAVCHUNK  960000
+#define MFCC_WAVCHUNK  560000
+//#define MFCC_WAVCHUNK  512
+
+//#define MFCC_MELBASE  6001
+#define MFCC_MELBASE  3501
+#define MFCC_MELCHUNK  80
+//#define MFCC_MELCHUNK  20
+
+//#define MFCC_BNFBASE  1499
+#define MFCC_BNFBASE  874
+#define MFCC_BNFCHUNK  256
+//input==== NodeArg(name='speech', type='tensor(float)', shape=['B', 'T', 80])
+//input==== NodeArg(name='speech_lengths', type='tensor(int32)', shape=['B'])
+//output==== NodeArg(name='encoder_out', type='tensor(float)', shape=['B', 'T_OUT', 'Addencoder_out_dim_2'])
+#define STREAM_BASE_MINOFF 10 
+#define STREAM_BASE_MINBLOCK 20
+#define STREAM_BASE_MAXBLOCK 50
+#define STREAM_BASE_TICK 40
+#define STREAM_BASE_PCM 1280
+#define STREAM_BASE_SAMP 640
+#define STREAM_BASE_BNF 256
+#define STREAM_CNT_BNF 20
+#define STREAM_OFF_BNF 20
+#define STREAM_ALL_BNF 20480
+#define STREAM_BASE_MEL 80
+#define STREAM_BASE_CNT 1500
+//#define STREAM_BASE_CNT 050
+#define STREAM_MFCC_FILL 10
+//#define STREAM_MFCC_FILL 5
+
--- a/duix-sdk/src/main/cpp/include/dhextctrl.h
+++ b/duix-sdk/src/main/cpp/include/dhextctrl.h
@ -0,0 +1,55 @@
+#ifndef GJ_EXTCTRL
+#define GJ_EXTCTRL
+
+#include <stdio.h>
+#include "dhextend.h"
+#include "gj_threadpool.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+
+  typedef void (*func_extprocess)(void *data);
+  typedef struct{
+    ext_env_t* env;;
+    jqueue_t* q_msg;
+    jqueue_t* q_input;
+    jqueue_t* q_output;
+    uint64_t  tick_process;
+    ext_handle_t*   hnd_process;
+    func_extrun     fn_run;
+    func_extprocess fn_process;
+  }ext_process_t;
+
+  typedef struct{
+    ext_model_t*  asr_model;
+    ext_model_t*  chat_model;
+    ext_model_t*  tts_model;
+    ext_model_t*  bnf_model;
+    ext_model_t*  render_model;
+  }extmain_t;
+
+  typedef struct{
+    volatile  uint64_t  m_sessid;
+    volatile int m_running;
+    ext_env_t* env_sess;
+    ext_process_t* asr_proc;
+    ext_process_t* chat_proc;
+    ext_process_t* tts_proc;
+    threadpool_t*  pool;
+  }extsess_t;
+
+  typedef int (*func_inout)(uint64_t looptick,void* arg);
+
+  int ext_createsess( extmain_t* extmain,char* uuid,extsess_t** pext);
+  int ext_startsess(extsess_t* ext,func_inout fn_input,func_inout fn_output,void* tag); 
+  int ext_stopsess(extsess_t* ext); 
+  int ext_destroysess(extsess_t** pext);
+
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
--- a/duix-sdk/src/main/cpp/include/dhextend.h
+++ b/duix-sdk/src/main/cpp/include/dhextend.h
@ -0,0 +1,62 @@
+
+#ifndef GJ_BOTCORE
+#define GJ_BOTCORE
+#include "dh_mem.h"
+#include "dh_data.h"
+#include "dh_que.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+  typedef struct ext_handle_t     ext_handle_t;
+  typedef struct ext_env_t     ext_env_t;;
+
+  typedef ext_handle_t* (*func_extcreate)(char* uuid,void* env,void* tag);
+  typedef int (*func_extdestroy)(ext_handle_t *exthandle);
+  typedef int  (*func_extupsess)(ext_handle_t *handle,uint64_t sessid);
+  typedef int  (*func_extstart)(ext_handle_t *handle);
+  typedef int  (*func_extstop)(ext_handle_t *handle);
+  typedef int  (*func_extrun)(ext_handle_t *handle,uint64_t sessid,jbuf_t* buf);
+  typedef int  (*func_extrunex)(ext_handle_t *handle,uint64_t sessid,jbuf_t** buf);
+
+  typedef struct ext_model_t{
+    int             m_id;
+    char*           m_name;
+    func_extcreate  fn_create;
+    func_extdestroy fn_destroy;
+  }ext_model_t;
+
+
+  struct ext_handle_t{
+    void            *ext_tag;
+    char            *m_uuid;
+    uint64_t        m_sessid;
+    func_extstart   fn_start;
+    func_extstop    fn_stop;
+    func_extupsess  fn_upsess;
+    func_extrun     fn_extrun;
+    func_extrun     fn_extrunex;
+  };
+
+  struct ext_env_t{
+    uint64_t  m_sessid;
+    volatile int  m_running;
+    jqueue_t* q_arrext[16];//msg pcm asr chat tts mfcc render
+  };
+
+#define INX_QMSG 0
+#define INX_QPCM 1
+#define INX_QASR 2
+#define INX_QCHAT 3
+#define INX_QANSWER 4
+#define INX_QTTS 5
+#define INX_QMFCC 6
+#define INX_QRENDER 7
+
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
--- a/duix-sdk/src/main/cpp/include/dhextinc.h
+++ b/duix-sdk/src/main/cpp/include/dhextinc.h
@ -0,0 +1,22 @@
+
+#ifndef GJ_EXTINC
+#define GJ_EXTINC
+#include "dhextend.h"
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+
+
+ext_model_t*  load_funasrext(char* cfg);
+ext_model_t*  load_chatggmlext(char* cfg);
+ext_model_t*  load_piperext(char* cfg);
+ext_model_t* load_msasrext(char* cfg);
+
+ext_model_t* load_aliasrext(char* cfg);
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
--- a/duix-sdk/src/main/cpp/include/gj_dll.h
+++ b/duix-sdk/src/main/cpp/include/gj_dll.h
@ -0,0 +1,21 @@
+#ifndef __GJ_DLL_H__
+#define __GJ_DLL_H__
+
+#ifdef __cplusplus
+extern "C" {
+#endif
+#define GJLIB_EXPORT 1
+#if defined(GJLIB_EXPORT)
+    #if defined _WIN32 || defined __CYGWIN__
+        #define GJLIBAPI __declspec(dllexport)
+    #else
+        #define GJLIBAPI __attribute__((visibility("default")))
+    #endif
+#else
+    #define GJLIBAPI
+#endif
+
+#ifdef __cplusplus
+}
+#endif
+#endif
--- a/duix-sdk/src/main/cpp/include/gjduix.h
+++ b/duix-sdk/src/main/cpp/include/gjduix.h
@ -0,0 +1,38 @@
+#ifndef GJDUIX_
+#define GJDUIX_
+
+#include <stdint.h>
+
+#ifdef __cplusplus
+extern "C"{
+#endif
+
+typedef struct dhmfcc_s dhmfcc_t;
+
+int dhmfcc_alloc(dhmfcc_t** pdg,int mincalc);
+int dhmfcc_initPcmex(dhmfcc_t* dg,int maxsize,int minoff ,int minblock ,int maxblock);
+int dhmfcc_initWenet(dhmfcc_t* dg,char* fnwenet); 
+
+uint64_t dhmfcc_newsession(dhmfcc_t* dg);
+int dhmfcc_pushpcm(dhmfcc_t* dg,uint64_t sessid,char* buf,int size,int kind);
+int dhmfcc_readpcm(dhmfcc_t* dg,uint64_t sessid,char* pcmbuf,int pcmlen,char* bnfbuf,int bnflen);
+int dhmfcc_finsession(dhmfcc_t* dg,uint64_t sessid);
+int dhmfcc_consession(dhmfcc_t* dg,uint64_t sessid);
+
+int dhmfcc_free(dhmfcc_t* dg);
+
+
+typedef struct dhunet_s dhunet_t;;
+int dhunet_alloc(dhunet_t** pdg,int minrender);
+int dhunet_initMunet(dhunet_t* dg,char* fnparam,char* fnbin,char* fnmsk);
+int dhunet_simprst(dhunet_t* dg,uint64_t sessid,uint8_t* bpic,int width,int height,int* box,uint8_t* bmsk,uint8_t* bfg,uint8_t* bnfbuf,int bnflen);
+int dhunet_free(dhunet_t* pdg);
+
+
+
+
+#ifdef __cplusplus
+}
+#endif
+
+#endif
--- a/duix-sdk/src/main/cpp/include/gjsimp.h
+++ b/duix-sdk/src/main/cpp/include/gjsimp.h
@ -0,0 +1,52 @@
+#ifndef GJSIMP
+#define GJSIMP
+
+#include <stdint.h>
+#ifdef __cplusplus
+extern "C"{
+#endif
+
+
+typedef struct dhduix_s dhduix_t;
+
+int dhduix_alloc(dhduix_t** pdg,int mincalc,int width,int height);
+int dhduix_initPcmex(dhduix_t* dg,int maxsize,int minoff ,int minblock ,int maxblock,int rgb);
+int dhduix_initWenet(dhduix_t* dg,char* fnwenet); 
+int dhduix_initMunet(dhduix_t* dg,char* fnparam,char* fnbin,char* fnmsk);
+int dhduix_initMunetex(dhduix_t* dg,char* fnparam,char* fnbin,char* fnmsk,int rect);
+
+uint64_t dhduix_newsession(dhduix_t* dg);
+
+int dhduix_pushpcm(dhduix_t* dg,uint64_t sessid,char* buf,int size,int kind);
+int dhduix_readpcm(dhduix_t* dg,uint64_t sessid,char* pcmbuf,int pcmlen,char* bnfbuf,int bnflen);
+int dhduix_simprst(dhduix_t* dg,uint64_t sessid,uint8_t* bpic,int width,int height,int* box,uint8_t* bmsk,uint8_t* bfg,uint8_t* bnfbuf,int bnflen);
+
+int dhduix_allcnt(dhduix_t* dg,uint64_t sessid);
+int dhduix_readycnt(dhduix_t* dg,uint64_t sessid);
+int dhduix_simpinx(dhduix_t* dg,uint64_t sessid,uint8_t* bpic,int width,int height,int* box,uint8_t* bmsk,uint8_t* bfg,int bnfinx);
+int dhduix_fileinx(dhduix_t* dg,uint64_t sessid,char* fnpic,int* box,char* fnmsk,char* fnfg,int bnfinx,char* bimg,char* mskbuf,int imgsize);
+int dhduix_simpblend(dhduix_t* dg,uint64_t sessid,uint8_t* bpic,int width,int height,uint8_t* bmsk,uint8_t* bfg);
+
+int dhduix_simppcm(dhduix_t* dg,char* buf,int size,char* pre,int presize,char* bnf,int bnfsize);
+
+
+int dhduix_finsession(dhduix_t* dg,uint64_t sessid);
+int dhduix_consession(dhduix_t* dg,uint64_t sessid);
+
+
+
+int dhduix_free(dhduix_t* dg);
+
+
+
+
+
+
+
+
+#ifdef __cplusplus
+}
+#endif
+
+
+#endif
--- a/duix-sdk/src/main/cpp/iostest/testduix.cpp
+++ b/duix-sdk/src/main/cpp/iostest/testduix.cpp
@ -0,0 +1,129 @@
+#include <stdlib.h>
+#include <string>
+#include <stdio.h>
+#include "gjduix.h"
+#include "jmat.h"
+#include <pthread.h>
+#include "dh_data.h"
+
+
+static volatile int g_running = 0;
+static volatile uint64_t g_sessid = 0;
+static void* mfccworker(void* arg){
+
+  static  uint64_t sessid ;
+  for(int k=0;k<1;k++){
+    dhmfcc_t* mfcc = (dhmfcc_t*)arg;
+    FILE* g_wavfile = fopen("data/b10.wav","rb");
+    fseek(g_wavfile,44,0);
+    sessid = dhmfcc_newsession(mfcc);
+    g_sessid = sessid;
+    int psize = 1000;
+    int tickcnt = 0;
+    int kkk = 0;
+    char* pcm = (char*)malloc(psize);
+    while(sessid == g_sessid){
+      int readpcm = fread(pcm,1,psize,g_wavfile);
+      if(readpcm<1)break;
+      dhmfcc_pushpcm(mfcc,sessid,pcm,readpcm,0);
+      tickcnt += readpcm;
+      uint64_t tick = jtimer_msstamp();
+      //printf("====push %d %ld \n",tickcnt,tick);
+      kkk++;
+      /*
+      if(kkk%100==99){
+        sessid = dhmfcc_newsession(mfcc);
+        g_sessid = sessid;
+      }
+      */
+      jtimer_mssleep(5);
+    }
+    jtimer_mssleep(10000);
+    dhmfcc_finsession(mfcc,sessid);
+    free(pcm);
+    fclose(g_wavfile);
+    printf("===finish\n");
+  }
+  return NULL;
+}
+
+
+int main(int argc,char** argv){
+  dhmfcc_t* mfcc = NULL;
+  int rst = 0;
+  rst = dhmfcc_alloc(&mfcc,2);
+  //char* fnwenet = "model/wenet.onnx";
+  char* fnwenet = "model/wenet.onnx";
+  rst = dhmfcc_initWenet(mfcc,fnwenet);
+  rst = dhmfcc_initPcmex(mfcc,0,10,20,50);
+  dhunet_t* unet = NULL;
+  rst = dhunet_alloc(&unet,20);
+  rst = dhunet_initMunet(unet,"model/xinyan_opt.param","model/xinyan_opt.bin","model/weight_168u.bin");
+
+  std::string fnpic = "data/xinyan.jpg";
+  std::string fnmsk = "data/m1.jpg";
+  std::string fnfg = "data/xinyan.jpg";
+  JMat* mat_msk = new JMat();
+  mat_msk->loadjpg(fnmsk,1);
+  JMat* mat_pic = new JMat();
+  mat_pic->loadjpg(fnpic,1);
+  JMat* mat_fg = new JMat();
+  mat_fg->loadjpg(fnfg,1);
+  int width = mat_pic->width();
+  int height = mat_pic->height();
+  int m_boxs[4];
+  m_boxs[0]=170;m_boxs[2]=382;m_boxs[1]=382;m_boxs[3]=592;
+  uint8_t* bpic = (uint8_t*)mat_pic->data();
+  uint8_t* bmsk = (uint8_t*)mat_msk->data();
+  uint8_t* bfg = (uint8_t*)mat_fg->data();
+  int* box = m_boxs;
+  int pcmsize = 1280;
+  char* pcm = (char*)malloc(1280);
+  int bnfsize = 1024*20;
+  char* bnf = (char*)malloc(1024*20);
+  pthread_t audtrd;
+  pthread_create(&audtrd, NULL, mfccworker, (void*)mfcc);
+  //mfccworker(mfcc);
+
+  printf("====render\n");
+  //getchar();
+  while(1){
+    if(!g_sessid){
+      printf("+");
+      //cv::waitKey(40);
+      jtimer_mssleep(40);
+      continue;
+    }
+    rst = dhmfcc_readpcm(mfcc,g_sessid,pcm,pcmsize,bnf,bnfsize);
+    printf("===readpcm %ld %d\n",g_sessid,rst);
+    if(rst>0){
+      uint64_t tick = jtimer_msstamp();
+      printf("====read  %ld \n",tick);
+      rst = dhunet_simprst(unet,g_sessid, bpic,width,height, box, bmsk, bfg, (uint8_t*)bnf,bnfsize);
+      printf("===simprst %d\n",rst);
+      mat_fg->show("aaa");
+      cv::waitKey(30);
+      jtimer_mssleep(40);
+    }else if(rst < 0){
+      break;
+    }else{
+      //cv::waitKey(40);
+      jtimer_mssleep(40);
+    }
+  }
+  g_sessid = 0;
+  pthread_join(audtrd,NULL);
+  printf("====exit\n");
+  //
+  rst = dhmfcc_free(mfcc);
+  printf("====exitmfcc\n");
+  /*
+  rst = dhunet_free(unet);
+  delete mat_pic;
+  delete mat_msk;
+  delete mat_fg;
+  */
+  free(pcm);
+  free(bnf);
+  return 0;
+}
--- a/duix-sdk/src/main/cpp/iostest/testsimp.cpp
+++ b/duix-sdk/src/main/cpp/iostest/testsimp.cpp
@ -0,0 +1,200 @@
+#include <stdlib.h>
+#include <string>
+#include <stdio.h>
+#include "gjsimp.h"
+#include "jmat.h"
+#include <pthread.h>
+#include "dh_data.h"
+
+
+static volatile int g_running = 0;
+static volatile uint64_t g_sessid = 0;
+static void* mfccworker(void* arg){
+
+  static  uint64_t sessid ;
+  for(int k=0;k<1;k++){
+    dhduix_t* mfcc = (dhduix_t*)arg;
+    FILE* g_wavfile = fopen("data/b10.wav","rb");
+    fseek(g_wavfile,44,0);
+    sessid = dhduix_newsession(mfcc);
+    g_sessid = sessid;
+    int psize = 1000;
+    int tickcnt = 0;
+    int kkk = 0;
+    char* pcm = (char*)malloc(psize);
+    while(sessid == g_sessid){
+      int readpcm = fread(pcm,1,psize,g_wavfile);
+      if(readpcm<1)break;
+      dhduix_pushpcm(mfcc,sessid,pcm,readpcm,0);
+      tickcnt += readpcm;
+      uint64_t tick = jtimer_msstamp();
+      //printf("====push %d %ld \n",tickcnt,tick);
+      kkk++;
+      /*
+         if(kkk%100==99){
+         sessid = dhduix_newsession(mfcc);
+         g_sessid = sessid;
+         }
+         */
+      jtimer_mssleep(5);
+    }
+    jtimer_mssleep(10000);
+    dhduix_finsession(mfcc,sessid);
+    free(pcm);
+    fclose(g_wavfile);
+    printf("===finish\n");
+  }
+  return NULL;
+}
+
+
+int mainmemcheck(int argc,char** argv){
+  dhduix_t* dg = NULL;
+  int rst = 0;
+  int width = 540;
+  int height = 720;
+  rst = dhduix_alloc(&dg,20,width,height);
+  //char* fnwenet = "model/wenet.onnx";
+  rst = dhduix_initPcmex(dg,0,10,20,50,0);
+  rst = dhduix_initMunetex(dg,"model/xinyan_opt.param","model/xinyan_opt.bin","model/weight_168u.bin",128);
+  //char* fnwenet = "model/wenet.onnx";
+  char* fnwenet = "model/wenet.onnx";
+  rst = dhduix_initWenet(dg,fnwenet);
+  char* pcm = (char*)malloc(102400);
+
+  std::string fnpic = "data/xinyan.jpg";
+  std::string fnmsk = "data/m1.jpg";
+  std::string fnfg = "data/xinyan.jpg";
+  JMat* mat_msk = new JMat();
+  mat_msk->loadjpg(fnmsk,1);
+  JMat* mat_pic = new JMat();
+  mat_pic->loadjpg(fnpic,1);
+  JMat* mat_fg = new JMat();
+  mat_fg->loadjpg(fnfg,1);
+  int m_boxs[4];
+  m_boxs[0]=170;m_boxs[2]=382;m_boxs[1]=382;m_boxs[3]=592;
+  uint8_t* bpic = (uint8_t*)mat_pic->data();
+  uint8_t* bmsk = (uint8_t*)mat_msk->data();
+  uint8_t* bfg = (uint8_t*)mat_fg->data();
+  int* box = m_boxs;
+  for(int m=0;m<10;m++){
+    g_sessid = dhduix_newsession(dg);
+    for(int k=0;k<100;k++){
+      dhduix_pushpcm(dg,g_sessid,pcm,102400,0);
+      int allcnt = dhduix_allcnt(dg,g_sessid);
+      printf("===allcnt %d\n",allcnt);
+    }
+    int readycnt = dhduix_readycnt(dg,g_sessid);
+    while(readycnt<1){
+      jtimer_mssleep(10);
+    }
+    for(int i=0;i<100;i++){
+      readycnt = dhduix_readycnt(dg,g_sessid);
+      //printf("===readycnt %d\n",readycnt);
+      rst = dhduix_simpinx(dg,g_sessid, bpic,width,height, box, bmsk, bfg, i);
+      printf("==simp %d\n",rst);
+      jtimer_mssleep(10);
+      if(rst<0)break;
+    }
+    dhduix_finsession(dg,g_sessid);
+  }
+  free(pcm);
+  delete mat_pic;
+  delete mat_msk;
+  delete mat_fg;
+  dhduix_free(dg);
+  return 0;
+}
+
+int main(int argc,char** argv){
+  dhduix_t* dg = NULL;
+  int rst = 0;
+  int width = 1080;
+  int height = 1920;
+  rst = dhduix_alloc(&dg,20,width,height);
+  //char* fnwenet = "model/wenet.onnx";
+  char* fnwenet = "model/wenet.onnx";
+  rst = dhduix_initWenet(dg,fnwenet);
+  rst = dhduix_initPcmex(dg,0,10,20,50,0);
+  rst = dhduix_initMunetex(dg,
+    "mdl128/pro128/dh_model.param",
+      "mdl128/pro128/dh_model.bin","model/weight_168u.bin",128);
+
+  //std::string fnpic = "data/xinyan.jpg";
+  //std::string fnmsk = "data/m1.jpg";
+  std::string fnpic = "mdl128/pro128/raw_jpgs/1.sij";
+  std::string fnmsk = "mdl128/pro128/pha/1.sij";
+  std::string fnfg = "mdl128/pro128/raw_sg/1.sij";
+  //std::string fnfg = "data/xinyan.jpg";
+  JMat* mat_msk = new JMat();
+  mat_msk->loadjpg(fnmsk,1);
+  JMat* mat_pic = new JMat();
+  mat_pic->loadjpg(fnpic,1);
+  JMat* mat_fg = new JMat();
+  mat_fg->loadjpg(fnfg,1);
+  int m_boxs[4];
+  //m_boxs[0]=170;m_boxs[2]=382;m_boxs[1]=382;m_boxs[3]=592;
+  m_boxs[0]=414;m_boxs[2]=669;m_boxs[1]=925;m_boxs[3]=1180;
+  uint8_t* bpic = (uint8_t*)mat_pic->data();
+  uint8_t* bmsk = (uint8_t*)mat_msk->data();
+  uint8_t* bfg = (uint8_t*)mat_fg->data();
+  int* box = m_boxs;
+  int pcmsize = 1280;
+  char* pcm = (char*)malloc(1280);
+  int bnfsize = 1024*20;
+  char* bnf = (char*)malloc(1024*20);
+  pthread_t audtrd;
+  pthread_create(&audtrd, NULL, mfccworker, (void*)dg);
+  //mfccworker(mfcc);
+
+  printf("====render\n");
+  //getchar();
+  int bnfinx = 0;
+  while(1){
+    if(!g_sessid){
+      printf("+");
+      //cv::waitKey(40);
+      jtimer_mssleep(40);
+      continue;
+    }
+    int readycnt = dhduix_readycnt(dg,g_sessid);
+    printf("====readycnt %d\n",readycnt);
+    if(!readycnt){
+      jtimer_mssleep(40);
+      continue;
+    }
+    rst = 1;//dhduix_readpcm(dg,g_sessid,pcm,pcmsize,bnf,bnfsize);
+    printf("===readpcm %ld %d\n",g_sessid,rst);
+    if(rst>0){
+      uint64_t tick = jtimer_msstamp();
+      //printf("====read  %ld \n",tick);
+      //rst = dhduix_simprst(dg,g_sessid, bpic,width,height, box, bmsk, bfg, (uint8_t*)bnf,bnfsize);
+      rst = dhduix_simpinx(dg,g_sessid, bpic,width,height, box, bmsk, bfg, bnfinx);
+      if(rst>0)bnfinx ++;
+      printf("===simprst %d\n",rst);
+      mat_fg->show("aaa");
+      cv::waitKey(20);
+      //jtimer_mssleep(40);
+    }else if(rst < 0){
+      break;
+    }else{
+      //cv::waitKey(40);
+      jtimer_mssleep(40);
+    }
+  }
+  g_sessid = 0;
+  pthread_join(audtrd,NULL);
+  printf("====exit\n");
+  //
+  rst = dhduix_free(dg);
+  printf("====exitmfcc\n");
+  /*
+     rst = dhduix_free(unet);
+     delete mat_pic;
+     delete mat_msk;
+     delete mat_fg;
+     */
+  free(pcm);
+  free(bnf);
+  return 0;
+}
--- a/duix-sdk/src/main/cpp/mk/Android.mk64
+++ b/duix-sdk/src/main/cpp/mk/Android.mk64
@ -0,0 +1,37 @@
+#/****************************************************************************
+#*   Cartoonifier, for Android.
+#*****************************************************************************
+#*   by Shervin Emami, 5th Dec 2012 (shervin.emami@gmail.com)
+#*   http://www.shervinemami.info/
+#*****************************************************************************
+#*   Ch1 of the book "Mastering OpenCV with Practical Computer Vision Projects"
+#*   Copyright Packt Publishing 2012.
+#*   http://www.packtpub.com/cool-projects-with-opencv/book
+#****************************************************************************/
+
+
+LOCAL_PATH := $(call my-dir)
+
+
+include $(CLEAR_VARS)
+
+
+
+LOCAL_SRC_FILES  += src/kmatarm.cpp
+
+LOCAL_ARM_NEON := true
+LOCAL_MODULE := facedetect
+LOCAL_LDLIBS +=  -llog -ldl -lm -lmediandk
+LOCAL_LDLIBS += -ljnigraphics -fopenmp
+LOCAL_CFLAGS += -fpermissive
+LOCAL_CPPFLAGS += -fpermissive
+#LOCAL_CFLAGS += -ftree-vectorizer-verbose=2
+LOCAL_CPPFLAGS += -std=c++17
+LOCAL_LDLIBS += -lstdc++
+
+LOCAL_C_INCLUDES += $(LOCAL_PATH)
+LOCAL_C_INCLUDES += include
+LOCAL_C_INCLUDES += opencv-mobile-4.6.0-android/sdk/native/jni/include/
+LOCAL_C_INCLUDES += ncnn-20221128-android-vulkan-shared/arm64-v8a/include/ncnn
+
+include $(BUILD_SHARED_LIBRARY)
--- a/duix-sdk/src/main/cpp/mk/android.sh
+++ b/duix-sdk/src/main/cpp/mk/android.sh
@ -0,0 +1,17 @@
+ANDROID_NDK=~/tools/android-ndk-r25c
+TOOLCHAIN=$ANDROID_NDK/build/cmake/android.toolchain.cmake
+BUILD_DIR=android-arm64
+mkdir -p $BUILD_DIR
+cd $BUILD_DIR
+#-G Ninja # fail
+cmake \
+    -DCMAKE_TOOLCHAIN_FILE=$TOOLCHAIN \
+    -DANDROID_LD=lld \
+    -DANDROID_ABI="arm64-v8a" \
+    -DANDROID_PLATFORM=android-24 \
+    -DCMAKE_BUILD_TYPE=Release \
+    -DPPLCV_USE_AARCH64=ON \
+    ..
+
+# -DHPCC_USE_AARCH64=ON \
+
--- a/duix-sdk/src/main/cpp/mk/bt
+++ b/duix-sdk/src/main/cpp/mk/bt
@ -0,0 +1,58 @@
+g++ -g  \
+    -Iinclude -Ibase -Irender -Idigit -Iaisdk \
+    -I/usr/include/opencv4/ \
+		   -Ithird/x86/include/ \
+		   -Ithird/x86/include/ncnn/  \
+		   -Ithird/x86/include/onnx/ \
+		   -Ithird/x86/include/turbojpeg/ \
+    aisdk/jmat.cpp \
+    src/kmatx86.cpp \
+    aisdk/wavreader.cpp \
+    aisdk/wenet.cpp \
+    aisdk/aimodel.cpp \
+    aisdk/scrfd.cpp \
+    aisdk/pfpld.cpp \
+    aisdk/munet.cpp \
+    aisdk/malpha.cpp \
+    aisdk/wavcache.cpp \
+    aisdk/blendgram.cpp \
+    aisdk/face_utils.cpp \
+    digit/netwav.cpp \
+    digit/looper.cpp \
+    digit/netcurl.cpp \
+    digit/GRender.cpp \
+    digit/GDigit.cpp \
+    digit/dispatchqueue.cpp \
+    base/BaseRenderHelper.cpp \
+    base/AudioTrack.cpp \
+    render/EglRenderer.cpp \
+    render/RgbVideoRenderer.cpp \
+    render/SurfaceVideoRenderer.cpp \
+    render/RenderHelper.cpp \
+    render/AudioRenderer.cpp \
+    render/GlesProgram.cpp \
+    base/Log.cpp \
+    base/FrameSource.cpp \
+    base/MediaData.cpp \
+    base/MessageSource.cpp \
+    base/MessageHelper.cpp \
+    base/LoopThread.cpp \
+    base/XThread.cpp \
+    base/XTick.c \
+    base/cJSON.c \
+    base/dh_mem.c \
+    digit/grtcfg.c \
+    base/LoopThreadHelper.cpp \
+    linux/linuxtest.cpp \
+    lib/libpplcv_static.a \
+    lib/libpplcommon_static.a \
+    -fpermissive   -Wwrite-strings \
+        -Llib \
+        -L/usr/lib/x86_64-linux-gnu/ \
+        -Lthird/ncnn-20221128-android-vulkan-shared/x86_64/lib/ \
+	   -Lthird/x86/lib	\
+       -ljpeg -lturbojpeg \
+		-lopencv_core -lopencv_dnn -lopencv_imgcodecs -lopencv_imgproc -lopencv_highgui -lopencv_videoio \
+		-lonnxruntime -lncnn -lcurl \
+        -lEGL -lOpenGL -lGLESv2 -lX11 \
+        -fopenmp
--- a/duix-sdk/src/main/cpp/mk/exbuildso64
+++ b/duix-sdk/src/main/cpp/mk/exbuildso64
@ -0,0 +1,4 @@
+ndk-build NDK_PROJECT_PATH=. APP_BUILD_SCRIPT=./Android.mk64 APP_PLATFORM=android-26 APP_STL=c++_static APP_CPPFLAGS=-fexceptions APP_CFLAGS=-Wno-error APP_ABI=arm64-v8a
+#arm64-v8a
+#armeabi-v7a
+
--- a/duix-sdk/src/main/cpp/third/arm/arm64-v8a/ffmpeg-lite/build_free_arm64_lite.sh
+++ b/duix-sdk/src/main/cpp/third/arm/arm64-v8a/ffmpeg-lite/build_free_arm64_lite.sh
@ -0,0 +1,119 @@
+# build.sh
+# 在Linux下编译FFmpeg成功的脚本
+# 注意Linux和windows的换行符\r\n不太一样，要转换（dos2unix）
+#!/bin/sh
+make clean
+export NDK=~/work/android-ndk-r15c-linux-x86_64/android-ndk-r15c
+export PREBUILT=$NDK/toolchains/aarch64-linux-android-4.9/prebuilt
+export PLATFORM=$NDK/platforms/android-21/arch-arm64
+export PREFIX=../fflib/free-arm64-lite
+build_one(){
+./configure --target-os=android --prefix=$PREFIX \
+--enable-cross-compile \
+--enable-runtime-cpudetect \
+--arch=aarch64 \
+--cross-prefix=$PREBUILT/linux-x86_64/bin/aarch64-linux-android- \
+--cc=$PREBUILT/linux-x86_64/bin/aarch64-linux-android-gcc \
+--nm=$PREBUILT/linux-x86_64/bin/aarch64-linux-android-nm \
+--sysroot=$PLATFORM \
+--disable-gpl --disable-nonfree \
+--enable-shared --enable-static --enable-small \
+--disable-doc --disable-ffprobe --disable-ffplay --disable-debug \
+--enable-jni \
+--enable-mediacodec \
+--disable-avdevice \
+--enable-avcodec \
+--enable-avformat \
+--enable-avutil \
+--enable-swresample \
+--enable-swscale \
+--disable-postproc \
+--enable-avfilter \
+--disable-avresample \
+--disable-decoders \
+--enable-decoder=aac \
+--enable-decoder=aac_latm \
+--enable-decoder=flv \
+--enable-decoder=h264 \
+--enable-decoder=mp3* \
+--enable-decoder=vp6f \
+--enable-decoder=flac \
+--enable-decoder=hevc \
+--enable-decoder=vp8 \
+--enable-decoder=vp9 \
+--enable-decoder=amrnb \
+--enable-decoder=amrwb \
+--enable-decoder=mjpeg \
+--enable-decoder=png \
+--enable-decoder=h264_mediacodec \
+--enable-hwaccel=h264_mediacodec \
+--disable-encoders \
+--enable-encoder=aac \
+--enable-encoder=h264 \
+--enable-encoder=hevc \
+--enable-encoder=png \
+--enable-encoder=mjpeg \
+--disable-demuxers \
+--enable-demuxer=aac \
+--enable-demuxer=concat \
+--enable-demuxer=data \
+--enable-demuxer=flv \
+--enable-demuxer=hls \
+--enable-demuxer=live_flv \
+--enable-demuxer=mov \
+--enable-demuxer=mp3 \
+--enable-demuxer=mpegps \
+--enable-demuxer=mpegts \
+--enable-demuxer=mpegvideo \
+--enable-demuxer=flac \
+--enable-demuxer=hevc \
+--enable-demuxer=webm_dash_manifest \
+--enable-demuxer=rtsp \
+--enable-demuxer=rtp \
+--enable-demuxer=h264 \
+--enable-demuxer=mp4 \
+--enable-demuxer=image2 \
+--disable-muxers \
+--enable-muxer=rtsp \
+--enable-muxer=rtp \
+--enable-muxer=flv \
+--enable-muxer=h264 \
+--enable-muxer=mp4 \
+--enable-muxer=hevc \
+--enable-muxer=image2 \
+--disable-parsers \
+--enable-parser=aac \
+--enable-parser=aac_latm \
+--enable-parser=h264 \
+--enable-parser=flac \
+--enable-parser=hevc \
+--enable-protocols \
+--enable-protocol=async \
+--disable-protocol=bluray \
+--disable-protocol=concat \
+--disable-protocol=crypto \
+--disable-protocol=ffrtmpcrypt \
+--enable-protocol=ffrtmphttp \
+--disable-protocol=gopher \
+--disable-protocol=icecast \
+--disable-protocol=librtmp* \
+--disable-protocol=libssh \
+--disable-protocol=md5 \
+--disable-protocol=mmsh \
+--disable-protocol=mmst \
+--disable-protocol=rtmp* \
+--enable-protocol=rtmp \
+--enable-protocol=rtmpt \
+--disable-protocol=rtp \
+--disable-protocol=sctp \
+--disable-protocol=srtp \
+--disable-protocol=subfile \
+--disable-protocol=unix \
+--disable-indevs \
+--disable-outdevs \
+--disable-stripping \
+--enable-asm
+}
+build_one
+make
+make install
--- a/duix-sdk/src/main/cpp/third/arm/armeabi-v7a/ffmpeg-lite/build_free_arm_lite.sh
+++ b/duix-sdk/src/main/cpp/third/arm/armeabi-v7a/ffmpeg-lite/build_free_arm_lite.sh
@ -0,0 +1,119 @@
+# build.sh
+# 在Linux下编译FFmpeg成功的脚本
+# 注意Linux和windows的换行符\r\n不太一样，要转换（dos2unix）
+#!/bin/sh
+make clean
+export NDK=~/work/android-ndk-r15c-linux-x86_64/android-ndk-r15c
+export PREBUILT=$NDK/toolchains/arm-linux-androideabi-4.9/prebuilt
+export PLATFORM=$NDK/platforms/android-21/arch-arm
+export PREFIX=../fflib/free-arm-lite
+build_one(){
+./configure --target-os=android --prefix=$PREFIX \
+--enable-cross-compile \
+--enable-runtime-cpudetect \
+--arch=arm \
+--cross-prefix=$PREBUILT/linux-x86_64/bin/arm-linux-androideabi- \
+--cc=$PREBUILT/linux-x86_64/bin/arm-linux-androideabi-gcc \
+--nm=$PREBUILT/linux-x86_64/bin/arm-linux-androideabi-nm \
+--sysroot=$PLATFORM \
+--disable-gpl --disable-nonfree \
+--enable-shared --enable-static --enable-small \
+--disable-doc --disable-ffprobe --disable-ffplay --disable-debug \
+--enable-jni \
+--enable-mediacodec \
+--disable-avdevice \
+--enable-avcodec \
+--enable-avformat \
+--enable-avutil \
+--enable-swresample \
+--enable-swscale \
+--disable-postproc \
+--enable-avfilter \
+--disable-avresample \
+--disable-decoders \
+--enable-decoder=aac \
+--enable-decoder=aac_latm \
+--enable-decoder=flv \
+--enable-decoder=h264 \
+--enable-decoder=mp3* \
+--enable-decoder=vp6f \
+--enable-decoder=flac \
+--enable-decoder=hevc \
+--enable-decoder=vp8 \
+--enable-decoder=vp9 \
+--enable-decoder=amrnb \
+--enable-decoder=amrwb \
+--enable-decoder=mjpeg \
+--enable-decoder=png \
+--enable-decoder=h264_mediacodec \
+--enable-hwaccel=h264_mediacodec \
+--disable-encoders \
+--enable-encoder=aac \
+--enable-encoder=h264 \
+--enable-encoder=hevc \
+--enable-encoder=png \
+--enable-encoder=mjpeg \
+--disable-demuxers \
+--enable-demuxer=aac \
+--enable-demuxer=concat \
+--enable-demuxer=data \
+--enable-demuxer=flv \
+--enable-demuxer=hls \
+--enable-demuxer=live_flv \
+--enable-demuxer=mov \
+--enable-demuxer=mp3 \
+--enable-demuxer=mpegps \
+--enable-demuxer=mpegts \
+--enable-demuxer=mpegvideo \
+--enable-demuxer=flac \
+--enable-demuxer=hevc \
+--enable-demuxer=webm_dash_manifest \
+--enable-demuxer=rtsp \
+--enable-demuxer=rtp \
+--enable-demuxer=h264 \
+--enable-demuxer=mp4 \
+--enable-demuxer=image2 \
+--disable-muxers \
+--enable-muxer=rtsp \
+--enable-muxer=rtp \
+--enable-muxer=flv \
+--enable-muxer=h264 \
+--enable-muxer=mp4 \
+--enable-muxer=hevc \
+--enable-muxer=image2 \
+--disable-parsers \
+--enable-parser=aac \
+--enable-parser=aac_latm \
+--enable-parser=h264 \
+--enable-parser=flac \
+--enable-parser=hevc \
+--enable-protocols \
+--enable-protocol=async \
+--disable-protocol=bluray \
+--disable-protocol=concat \
+--disable-protocol=crypto \
+--disable-protocol=ffrtmpcrypt \
+--enable-protocol=ffrtmphttp \
+--disable-protocol=gopher \
+--disable-protocol=icecast \
+--disable-protocol=librtmp* \
+--disable-protocol=libssh \
+--disable-protocol=md5 \
+--disable-protocol=mmsh \
+--disable-protocol=mmst \
+--disable-protocol=rtmp* \
+--enable-protocol=rtmp \
+--enable-protocol=rtmpt \
+--disable-protocol=rtp \
+--disable-protocol=sctp \
+--disable-protocol=srtp \
+--disable-protocol=subfile \
+--disable-protocol=unix \
+--disable-indevs \
+--disable-outdevs \
+--disable-stripping \
+--enable-asm
+}
+build_one
+make
+make install
--- a/duix-sdk/src/main/cpp/third/arm/armeabi-v7a/libcurl.la
+++ b/duix-sdk/src/main/cpp/third/arm/armeabi-v7a/libcurl.la
@ -0,0 +1,41 @@
+# libcurl.la - a libtool library file
+# Generated by libtool (GNU libtool) 2.4.6
+#
+# Please DO NOT delete this file!
+# It is necessary for linking the library.
+
+# The name that we can dlopen(3).
+dlname=''
+
+# Names of this library.
+library_names=''
+
+# The name of the static archive.
+old_library='libcurl.a'
+
+# Linker flags that cannot go in dependency_libs.
+inherited_linker_flags=''
+
+# Libraries that this one depends upon.
+dependency_libs=' -L/Users/rying/repo/openssl-curl-android/openssl/build/armeabi-v7a/lib -lssl -lcrypto -lz'
+
+# Names of additional weak libraries provided by this library
+weak_library_names=''
+
+# Version information for libcurl.
+current=0
+age=0
+revision=0
+
+# Is this an already installed library?
+installed=yes
+
+# Should we warn about portability when linking against -modules?
+shouldnotlink=no
+
+# Files to dlopen/dlpreopen
+dlopen=''
+dlpreopen=''
+
+# Directory that this library needs to be installed in:
+libdir='/Users/rying/repo/openssl-curl-android/curl/build/armeabi-v7a/lib'
--- a/duix-sdk/src/main/cpp/third/arm/include/ffmpeg/libavcodec/avcodec.h
+++ b/duix-sdk/src/main/cpp/third/arm/include/ffmpeg/libavcodec/avcodec.h
--- a/duix-sdk/src/main/cpp/third/arm/include/ffmpeg/libavcodec/avdct.h
+++ b/duix-sdk/src/main/cpp/third/arm/include/ffmpeg/libavcodec/avdct.h
@ -0,0 +1,84 @@
+/*
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#ifndef AVCODEC_AVDCT_H
+#define AVCODEC_AVDCT_H
+
+#include "libavutil/opt.h"
+
+/**
+ * AVDCT context.
+ * @note function pointers can be NULL if the specific features have been
+ *       disabled at build time.
+ */
+typedef struct AVDCT {
+    const AVClass *av_class;
+
+    void (*idct)(int16_t *block /* align 16 */);
+
+    /**
+     * IDCT input permutation.
+     * Several optimized IDCTs need a permutated input (relative to the
+     * normal order of the reference IDCT).
+     * This permutation must be performed before the idct_put/add.
+     * Note, normally this can be merged with the zigzag/alternate scan<br>
+     * An example to avoid confusion:
+     * - (->decode coeffs -> zigzag reorder -> dequant -> reference IDCT -> ...)
+     * - (x -> reference DCT -> reference IDCT -> x)
+     * - (x -> reference DCT -> simple_mmx_perm = idct_permutation
+     *    -> simple_idct_mmx -> x)
+     * - (-> decode coeffs -> zigzag reorder -> simple_mmx_perm -> dequant
+     *    -> simple_idct_mmx -> ...)
+     */
+    uint8_t idct_permutation[64];
+
+    void (*fdct)(int16_t *block /* align 16 */);
+
+
+    /**
+     * DCT algorithm.
+     * must use AVOptions to set this field.
+     */
+    int dct_algo;
+
+    /**
+     * IDCT algorithm.
+     * must use AVOptions to set this field.
+     */
+    int idct_algo;
+
+    void (*get_pixels)(int16_t *block /* align 16 */,
+                       const uint8_t *pixels /* align 8 */,
+                       ptrdiff_t line_size);
+
+    int bits_per_sample;
+} AVDCT;
+
+/**
+ * Allocates a AVDCT context.
+ * This needs to be initialized with avcodec_dct_init() after optionally
+ * configuring it with AVOptions.
+ *
+ * To free it use av_free()
+ */
+AVDCT *avcodec_dct_alloc(void);
+int avcodec_dct_init(AVDCT *);
+
+const AVClass *avcodec_dct_get_class(void);
+
+#endif /* AVCODEC_AVDCT_H */
--- a/duix-sdk/src/main/cpp/third/arm/include/ffmpeg/libavcodec/avfft.h
+++ b/duix-sdk/src/main/cpp/third/arm/include/ffmpeg/libavcodec/avfft.h
@ -0,0 +1,118 @@
+/*
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#ifndef AVCODEC_AVFFT_H
+#define AVCODEC_AVFFT_H
+
+/**
+ * @file
+ * @ingroup lavc_fft
+ * FFT functions
+ */
+
+/**
+ * @defgroup lavc_fft FFT functions
+ * @ingroup lavc_misc
+ *
+ * @{
+ */
+
+typedef float FFTSample;
+
+typedef struct FFTComplex {
+    FFTSample re, im;
+} FFTComplex;
+
+typedef struct FFTContext FFTContext;
+
+/**
+ * Set up a complex FFT.
+ * @param nbits           log2 of the length of the input array
+ * @param inverse         if 0 perform the forward transform, if 1 perform the inverse
+ */
+FFTContext *av_fft_init(int nbits, int inverse);
+
+/**
+ * Do the permutation needed BEFORE calling ff_fft_calc().
+ */
+void av_fft_permute(FFTContext *s, FFTComplex *z);
+
+/**
+ * Do a complex FFT with the parameters defined in av_fft_init(). The
+ * input data must be permuted before. No 1.0/sqrt(n) normalization is done.
+ */
+void av_fft_calc(FFTContext *s, FFTComplex *z);
+
+void av_fft_end(FFTContext *s);
+
+FFTContext *av_mdct_init(int nbits, int inverse, double scale);
+void av_imdct_calc(FFTContext *s, FFTSample *output, const FFTSample *input);
+void av_imdct_half(FFTContext *s, FFTSample *output, const FFTSample *input);
+void av_mdct_calc(FFTContext *s, FFTSample *output, const FFTSample *input);
+void av_mdct_end(FFTContext *s);
+
+/* Real Discrete Fourier Transform */
+
+enum RDFTransformType {
+    DFT_R2C,
+    IDFT_C2R,
+    IDFT_R2C,
+    DFT_C2R,
+};
+
+typedef struct RDFTContext RDFTContext;
+
+/**
+ * Set up a real FFT.
+ * @param nbits           log2 of the length of the input array
+ * @param trans           the type of transform
+ */
+RDFTContext *av_rdft_init(int nbits, enum RDFTransformType trans);
+void av_rdft_calc(RDFTContext *s, FFTSample *data);
+void av_rdft_end(RDFTContext *s);
+
+/* Discrete Cosine Transform */
+
+typedef struct DCTContext DCTContext;
+
+enum DCTTransformType {
+    DCT_II = 0,
+    DCT_III,
+    DCT_I,
+    DST_I,
+};
+
+/**
+ * Set up DCT.
+ *
+ * @param nbits           size of the input array:
+ *                        (1 << nbits)     for DCT-II, DCT-III and DST-I
+ *                        (1 << nbits) + 1 for DCT-I
+ * @param type            the type of transform
+ *
+ * @note the first element of the input of DST-I is ignored
+ */
+DCTContext *av_dct_init(int nbits, enum DCTTransformType type);
+void av_dct_calc(DCTContext *s, FFTSample *data);
+void av_dct_end (DCTContext *s);
+
+/**
+ * @}
+ */
+
+#endif /* AVCODEC_AVFFT_H */
--- a/duix-sdk/src/main/cpp/third/arm/include/ffmpeg/libavcodec/d3d11va.h
+++ b/duix-sdk/src/main/cpp/third/arm/include/ffmpeg/libavcodec/d3d11va.h
@ -0,0 +1,112 @@
+/*
+ * Direct3D11 HW acceleration
+ *
+ * copyright (c) 2009 Laurent Aimar
+ * copyright (c) 2015 Steve Lhomme
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#ifndef AVCODEC_D3D11VA_H
+#define AVCODEC_D3D11VA_H
+
+/**
+ * @file
+ * @ingroup lavc_codec_hwaccel_d3d11va
+ * Public libavcodec D3D11VA header.
+ */
+
+#if !defined(_WIN32_WINNT) || _WIN32_WINNT < 0x0602
+#undef _WIN32_WINNT
+#define _WIN32_WINNT 0x0602
+#endif
+
+#include <stdint.h>
+#include <d3d11.h>
+
+/**
+ * @defgroup lavc_codec_hwaccel_d3d11va Direct3D11
+ * @ingroup lavc_codec_hwaccel
+ *
+ * @{
+ */
+
+#define FF_DXVA2_WORKAROUND_SCALING_LIST_ZIGZAG 1 ///< Work around for Direct3D11 and old UVD/UVD+ ATI video cards
+#define FF_DXVA2_WORKAROUND_INTEL_CLEARVIDEO    2 ///< Work around for Direct3D11 and old Intel GPUs with ClearVideo interface
+
+/**
+ * This structure is used to provides the necessary configurations and data
+ * to the Direct3D11 FFmpeg HWAccel implementation.
+ *
+ * The application must make it available as AVCodecContext.hwaccel_context.
+ *
+ * Use av_d3d11va_alloc_context() exclusively to allocate an AVD3D11VAContext.
+ */
+typedef struct AVD3D11VAContext {
+    /**
+     * D3D11 decoder object
+     */
+    ID3D11VideoDecoder *decoder;
+
+    /**
+      * D3D11 VideoContext
+      */
+    ID3D11VideoContext *video_context;
+
+    /**
+     * D3D11 configuration used to create the decoder
+     */
+    D3D11_VIDEO_DECODER_CONFIG *cfg;
+
+    /**
+     * The number of surface in the surface array
+     */
+    unsigned surface_count;
+
+    /**
+     * The array of Direct3D surfaces used to create the decoder
+     */
+    ID3D11VideoDecoderOutputView **surface;
+
+    /**
+     * A bit field configuring the workarounds needed for using the decoder
+     */
+    uint64_t workaround;
+
+    /**
+     * Private to the FFmpeg AVHWAccel implementation
+     */
+    unsigned report_id;
+
+    /**
+      * Mutex to access video_context
+      */
+    HANDLE  context_mutex;
+} AVD3D11VAContext;
+
+/**
+ * Allocate an AVD3D11VAContext.
+ *
+ * @return Newly-allocated AVD3D11VAContext or NULL on failure.
+ */
+AVD3D11VAContext *av_d3d11va_alloc_context(void);
+
+/**
+ * @}
+ */
+
+#endif /* AVCODEC_D3D11VA_H */
--- a/duix-sdk/src/main/cpp/third/arm/include/ffmpeg/libavcodec/dirac.h
+++ b/duix-sdk/src/main/cpp/third/arm/include/ffmpeg/libavcodec/dirac.h
@ -0,0 +1,131 @@
+/*
+ * Copyright (C) 2007 Marco Gerards <marco@gnu.org>
+ * Copyright (C) 2009 David Conrad
+ * Copyright (C) 2011 Jordi Ortiz
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#ifndef AVCODEC_DIRAC_H
+#define AVCODEC_DIRAC_H
+
+/**
+ * @file
+ * Interface to Dirac Decoder/Encoder
+ * @author Marco Gerards <marco@gnu.org>
+ * @author David Conrad
+ * @author Jordi Ortiz
+ */
+
+#include "avcodec.h"
+
+/**
+ * The spec limits the number of wavelet decompositions to 4 for both
+ * level 1 (VC-2) and 128 (long-gop default).
+ * 5 decompositions is the maximum before >16-bit buffers are needed.
+ * Schroedinger allows this for DD 9,7 and 13,7 wavelets only, limiting
+ * the others to 4 decompositions (or 3 for the fidelity filter).
+ *
+ * We use this instead of MAX_DECOMPOSITIONS to save some memory.
+ */
+#define MAX_DWT_LEVELS 5
+
+/**
+ * Parse code values:
+ *
+ * Dirac Specification ->
+ * 9.6.1  Table 9.1
+ *
+ * VC-2 Specification  ->
+ * 10.4.1 Table 10.1
+ */
+
+enum DiracParseCodes {
+    DIRAC_PCODE_SEQ_HEADER      = 0x00,
+    DIRAC_PCODE_END_SEQ         = 0x10,
+    DIRAC_PCODE_AUX             = 0x20,
+    DIRAC_PCODE_PAD             = 0x30,
+    DIRAC_PCODE_PICTURE_CODED   = 0x08,
+    DIRAC_PCODE_PICTURE_RAW     = 0x48,
+    DIRAC_PCODE_PICTURE_LOW_DEL = 0xC8,
+    DIRAC_PCODE_PICTURE_HQ      = 0xE8,
+    DIRAC_PCODE_INTER_NOREF_CO1 = 0x0A,
+    DIRAC_PCODE_INTER_NOREF_CO2 = 0x09,
+    DIRAC_PCODE_INTER_REF_CO1   = 0x0D,
+    DIRAC_PCODE_INTER_REF_CO2   = 0x0E,
+    DIRAC_PCODE_INTRA_REF_CO    = 0x0C,
+    DIRAC_PCODE_INTRA_REF_RAW   = 0x4C,
+    DIRAC_PCODE_INTRA_REF_PICT  = 0xCC,
+    DIRAC_PCODE_MAGIC           = 0x42424344,
+};
+
+typedef struct DiracVersionInfo {
+    int major;
+    int minor;
+} DiracVersionInfo;
+
+typedef struct AVDiracSeqHeader {
+    unsigned width;
+    unsigned height;
+    uint8_t chroma_format;          ///< 0: 444  1: 422  2: 420
+
+    uint8_t interlaced;
+    uint8_t top_field_first;
+
+    uint8_t frame_rate_index;       ///< index into dirac_frame_rate[]
+    uint8_t aspect_ratio_index;     ///< index into dirac_aspect_ratio[]
+
+    uint16_t clean_width;
+    uint16_t clean_height;
+    uint16_t clean_left_offset;
+    uint16_t clean_right_offset;
+
+    uint8_t pixel_range_index;      ///< index into dirac_pixel_range_presets[]
+    uint8_t color_spec_index;       ///< index into dirac_color_spec_presets[]
+
+    int profile;
+    int level;
+
+    AVRational framerate;
+    AVRational sample_aspect_ratio;
+
+    enum AVPixelFormat pix_fmt;
+    enum AVColorRange color_range;
+    enum AVColorPrimaries color_primaries;
+    enum AVColorTransferCharacteristic color_trc;
+    enum AVColorSpace colorspace;
+
+    DiracVersionInfo version;
+    int bit_depth;
+} AVDiracSeqHeader;
+
+/**
+ * Parse a Dirac sequence header.
+ *
+ * @param dsh this function will allocate and fill an AVDiracSeqHeader struct
+ *            and write it into this pointer. The caller must free it with
+ *            av_free().
+ * @param buf the data buffer
+ * @param buf_size the size of the data buffer in bytes
+ * @param log_ctx if non-NULL, this function will log errors here
+ * @return 0 on success, a negative AVERROR code on failure
+ */
+int av_dirac_parse_sequence_header(AVDiracSeqHeader **dsh,
+                                   const uint8_t *buf, size_t buf_size,
+                                   void *log_ctx);
+
+#endif /* AVCODEC_DIRAC_H */
--- a/duix-sdk/src/main/cpp/third/arm/include/ffmpeg/libavcodec/dv_profile.h
+++ b/duix-sdk/src/main/cpp/third/arm/include/ffmpeg/libavcodec/dv_profile.h
@ -0,0 +1,83 @@
+/*
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#ifndef AVCODEC_DV_PROFILE_H
+#define AVCODEC_DV_PROFILE_H
+
+#include <stdint.h>
+
+#include "libavutil/pixfmt.h"
+#include "libavutil/rational.h"
+#include "avcodec.h"
+
+/* minimum number of bytes to read from a DV stream in order to
+ * determine the profile */
+#define DV_PROFILE_BYTES (6 * 80) /* 6 DIF blocks */
+
+
+/*
+ * AVDVProfile is used to express the differences between various
+ * DV flavors. For now it's primarily used for differentiating
+ * 525/60 and 625/50, but the plans are to use it for various
+ * DV specs as well (e.g. SMPTE314M vs. IEC 61834).
+ */
+typedef struct AVDVProfile {
+    int              dsf;                   /* value of the dsf in the DV header */
+    int              video_stype;           /* stype for VAUX source pack */
+    int              frame_size;            /* total size of one frame in bytes */
+    int              difseg_size;           /* number of DIF segments per DIF channel */
+    int              n_difchan;             /* number of DIF channels per frame */
+    AVRational       time_base;             /* 1/framerate */
+    int              ltc_divisor;           /* FPS from the LTS standpoint */
+    int              height;                /* picture height in pixels */
+    int              width;                 /* picture width in pixels */
+    AVRational       sar[2];                /* sample aspect ratios for 4:3 and 16:9 */
+    enum AVPixelFormat pix_fmt;             /* picture pixel format */
+    int              bpm;                   /* blocks per macroblock */
+    const uint8_t   *block_sizes;           /* AC block sizes, in bits */
+    int              audio_stride;          /* size of audio_shuffle table */
+    int              audio_min_samples[3];  /* min amount of audio samples */
+                                            /* for 48kHz, 44.1kHz and 32kHz */
+    int              audio_samples_dist[5]; /* how many samples are supposed to be */
+                                            /* in each frame in a 5 frames window */
+    const uint8_t  (*audio_shuffle)[9];     /* PCM shuffling table */
+} AVDVProfile;
+
+/**
+ * Get a DV profile for the provided compressed frame.
+ *
+ * @param sys the profile used for the previous frame, may be NULL
+ * @param frame the compressed data buffer
+ * @param buf_size size of the buffer in bytes
+ * @return the DV profile for the supplied data or NULL on failure
+ */
+const AVDVProfile *av_dv_frame_profile(const AVDVProfile *sys,
+                                       const uint8_t *frame, unsigned buf_size);
+
+/**
+ * Get a DV profile for the provided stream parameters.
+ */
+const AVDVProfile *av_dv_codec_profile(int width, int height, enum AVPixelFormat pix_fmt);
+
+/**
+ * Get a DV profile for the provided stream parameters.
+ * The frame rate is used as a best-effort parameter.
+ */
+const AVDVProfile *av_dv_codec_profile2(int width, int height, enum AVPixelFormat pix_fmt, AVRational frame_rate);
+
+#endif /* AVCODEC_DV_PROFILE_H */
--- a/duix-sdk/src/main/cpp/third/arm/include/ffmpeg/libavcodec/dxva2.h
+++ b/duix-sdk/src/main/cpp/third/arm/include/ffmpeg/libavcodec/dxva2.h
@ -0,0 +1,93 @@
+/*
+ * DXVA2 HW acceleration
+ *
+ * copyright (c) 2009 Laurent Aimar
+ *
+ * This file is part of FFmpeg.
+ *
+ * FFmpeg is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public
+ * License as published by the Free Software Foundation; either
+ * version 2.1 of the License, or (at your option) any later version.
+ *
+ * FFmpeg is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with FFmpeg; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ */
+
+#ifndef AVCODEC_DXVA2_H
+#define AVCODEC_DXVA2_H
+
+/**
+ * @file
+ * @ingroup lavc_codec_hwaccel_dxva2
+ * Public libavcodec DXVA2 header.
+ */
+
+#if !defined(_WIN32_WINNT) || _WIN32_WINNT < 0x0602
+#undef _WIN32_WINNT
+#define _WIN32_WINNT 0x0602
+#endif
+
+#include <stdint.h>
+#include <d3d9.h>
+#include <dxva2api.h>
+
+/**
+ * @defgroup lavc_codec_hwaccel_dxva2 DXVA2
+ * @ingroup lavc_codec_hwaccel
+ *
+ * @{
+ */
+
+#define FF_DXVA2_WORKAROUND_SCALING_LIST_ZIGZAG 1 ///< Work around for DXVA2 and old UVD/UVD+ ATI video cards
+#define FF_DXVA2_WORKAROUND_INTEL_CLEARVIDEO    2 ///< Work around for DXVA2 and old Intel GPUs with ClearVideo interface
+
+/**
+ * This structure is used to provides the necessary configurations and data
+ * to the DXVA2 FFmpeg HWAccel implementation.
+ *
+ * The application must make it available as AVCodecContext.hwaccel_context.
+ */
+struct dxva_context {
+    /**
+     * DXVA2 decoder object
+     */
+    IDirectXVideoDecoder *decoder;
+
+    /**
+     * DXVA2 configuration used to create the decoder
+     */
+    const DXVA2_ConfigPictureDecode *cfg;
+
+    /**
+     * The number of surface in the surface array
+     */
+    unsigned surface_count;
+
+    /**
+     * The array of Direct3D surfaces used to create the decoder
+     */
+    LPDIRECT3DSURFACE9 *surface;
+
+    /**
+     * A bit field configuring the workarounds needed for using the decoder
+     */
+    uint64_t workaround;
+
+    /**
+     * Private to the FFmpeg AVHWAccel implementation
+     */
+    unsigned report_id;
+};
+
+/**
+ * @}
+ */
+
+#endif /* AVCODEC_DXVA2_H */
--- a/Show More
+++ b/Show More