Monocular vision technology has attracted considerable attention in visual SLAM research due to its simple structure and low cost. However, it faces inherent limitations in dense map reconstruction, primarily due to the absence of direct depth information. To address this challenge, we propose a dense point cloud reconstruction method based on a monocular-inertial sparse visual SLAM framework. This approach enhances the ORB-SLAM3 system performance by integrating a monocular depth estimation network based on deep learning and attention mechanisms. The predicted dense depth maps are fused with camera poses estimated by SLAM to generate a globally consistent 3D point cloud map. Experimental results conducted on open datasets demonstrate that our method maintains accurate localization performance while producing dense and structurally complete point cloud maps. The depth estimation accuracy on KITTI achieves 90.1% of pixels satisfying the error threshold of δ1< 1.25. Consequently, this system is well suited for large-scale outdoor 3D reconstruction and environmental perception tasks, offering high practical value.