Standard Intelligence, a boutique consultancy focused on AI and data strategy, announced the release of FDM-1, a new computer-action model designed to learn how to operate digital interfaces by observing video recordings of real user activity.
The company said in the release statement that the system is trained on more than 11 million hours of screen recordings, making it larger than any publicly available dataset previously used for computer-use modeling. To generate training signals at this scale, the firm applied an automated technique that reconstructs likely user actions, such as keystrokes and cursor movements, directly from visual changes on the screen. This approach allows the model to infer how interactions unfold without relying primarily on manually annotated data.
FDM-1 is built to process long and continuous video streams, enabling it to follow nearly two hours of uninterrupted screen activity in a single session. The extended context window allows the model to capture complex workflows that unfold over longer time horizons, such as engineering, design, and financial operations. The company said this capability enables the system to reason over more visual context than earlier computer-use agents, which are typically limited to short sequences or static screenshots.
In demonstrations released alongside the announcement, the model was shown performing a range of tasks, including building mechanical components in computer-aided design software, identifying software bugs through automated interface exploration, and controlling a real vehicle using live visual feeds and keyboard inputs on public streets in San Francisco. According to the company, the driving demonstration required less than one hour of task-specific fine-tuning.
The firm stated that FDM-1 is designed to operate directly on raw video rather than simplified visual snapshots, enabling the model to learn continuous actions such as scrolling, dragging, and three-dimensional manipulation. By predicting the next user action based on both visual frames and prior interaction history, the system aims to generalize across a wide range of software environments without the need for task-specific reinforcement learning setups.
The company said the broader objective behind the launch is to move computer-use agents from a data-constrained development model to a compute-constrained one, allowing far larger volumes of publicly available instructional and workflow video to be used for training. Executives described the release as a step toward enabling AI systems to learn how people work with digital tools in practice, in a similar way that LLMs learned patterns of writing and communication from internet text.
The post Standard Intelligence Launches FDM-1, AI System Capable Of Learning Complex Computer Tasks From Video appeared first on Metaverse Post.


