Against the backdrop of on-orbit servicing engineering, space manipulator systems face increasing demands for precise and complex task execution, high-performance control, and model robustness, while existing multi-manipulator cooperative control research remains confined mainly to joint space, exhibiting limitations in both control performance and model robustness. To address these challenges, this paper first establishes a cooperative control dynamics model for heterogeneous multi-space manipulators in task space. Utilizing a combined-variable approach, the differential control equation of each agent is transformed from a coupling of multiple time-varying variables into a single time-varying variable. An easily implementable online off-policy reinforcement learning control method is developed, enabling feedback control computation without direct dependence on the analytical form of the dynamic model. By integrating this approach with a centralized training with distributed execution control strategy, asymptotic stability and global optimality of the control system are ensured, with rigorous theoretical analysis providing formal proof. Finally, simulations validate the effectiveness and advantages of the proposed algorithm. © 2025 Elsevier B.V., All rights reserved.