Today's Open Source (2024-07-16): H2O-Danube3 for Smartphones

Discover the latest in AI: H2O-Danube3 for smartphones, MotionClone for video generation, Cradle for game AI, and more. Explore now!

Jul 16, 2024

Here are some interesting AI open-source models and frameworks I want to share today.

Project: H2O Danube3

H2O-Danube3 is a series of small language models, including H2O-Danube3-4B and H2O-Danube3-500M, trained on 6T and 4T tokens, respectively. These models are pretrained on high-quality web data, mostly in English, through three stages of data mixing and finally fine-tuned for chat adaptation.

Due to their compact architecture, H2O-Danube3 can run efficiently on modern smartphones, enabling local inference and fast processing, even on mobile devices.

https://arxiv.org/abs/2407.09276

https://huggingface.co/h2oai/h2o-danube3-500m-base

https://huggingface.co/h2oai/h2o-danube3-4b-base

Project: MotionClone

MotionClone is a training-free framework that clones actions from reference videos to control text-to-video generation.

It uses temporal attention in video inversion to capture actions in reference videos and introduces main temporal attention guidance to reduce noise or tiny actions in attention weights.

Additionally, it proposes a position-aware semantic guidance mechanism, using rough foreground positions and original classifier-free guidance features from reference videos to aid video generation.

https://arxiv.org/abs/2406.05338

https://github.com/Bujiazi/MotionClone

Project: Cradle

Cradle, developed by Kunlun Wanwei and the Beijing Academy of Artificial Intelligence, is an open-source AI framework that can play various commercial games and operate software applications.

This new general computer control framework allows AI agents to control keyboards and mice like humans, without relying on any internal APIs, enabling interactions with any software.

It provides powerful reasoning, self-improvement, and skill management in a standardized general environment, supporting agents to complete any computing task with minimal environmental requirements.

https://arxiv.org/abs/2403.03186

https://github.com/BAAI-Agents/Cradle

Project: LLM Graph Builder

LLM Graph Builder by Neo4j is an open-source generator that creates knowledge graphs from unstructured data, PDFs, documents, texts, YouTube videos, web pages, and more, stored in Neo4j.

It uses large models like OpenAI, Gemini, Llama3, Diffbot, Claude, and Qwen to extract nodes, relationships, and their attributes from unstructured data.

https://github.com/neo4j-labs/llm-graph-builder

Project: mllm

mllm is a fast and lightweight multimodal large language model inference engine designed for mobile and edge devices.

It is implemented in pure C/C++ with no dependencies, supports ARM NEON and x86 AVX2, and offers 4-bit and 6-bit integer quantization.

mllm enables smart personal assistants, text-based image retrieval, and screen visual question answering on devices, ensuring data privacy.

https://github.com/UbiquitousLearning/mllm

Project: Embodied AI Paper List

Embodied AI Paper List is a curated collection of excellent papers and reviews on Embodied AI.

Maintained by Sun Yat-sen University's HCPLab team, this project aims to compile and share the latest Embodied AI research and resources.

It includes papers and books from various subfields such as multimodal large models, vision-language-action models, and general robot learning, helping researchers stay updated with the field's advancements.

https://github.com/HCPLab-SYSU/Embodied_AI_Paper_List

Project: embodied-CoT

Embodied Chain of Thought (ECoT) is a novel method for training robot strategies.

This method uses vision-language-action models to generate reasoning steps in response to instructions and images, then selects robot actions, improving performance, interpretability, and generalization.

The project code is based on OpenVLA and provides detailed documentation on code and dependencies.

https://github.com/MichalZawalski/embodied-CoT/

Today's Open Source (2024-07-15): AuraFlow, 6.8B Stream-based Open-Source Text-to-Image Model

Meng Li

Jul 15

Read full story

AI Disruption

Today's Open Source (2024-07-15): AuraFlow, 6.8B Stream-based Open-Source Text-to-Image Model

Discussion about this post