Future Internet of Things (IoTs) networks will consist of a Hybrid Access Point (HAP) that powers devices via Radio Frequency (RF) signals. These devices harvest RF-energy and are tasked with monitoring target(s) such as vehicles at key locations. In order to maximize their expected targets monitoring time, this paper considers the problem of optimizing the HAP's transmit power, and the active time of devices. We formulate a Stochastic Program (SP) and solve it using the Sample Average Approximation method. We also present a sequential Monte-Carlo based reinforcement learning method or SMC-L to learn sensor activation time. Our results show that (i) the confidence interval upper bound of the solution derived by the SP is affected by the number of samples that represent targets appearance time, (ii) increasing the sensing range of devices affects the total expected target monitoring time, (iii) the learning speed of SMC-L is affected by its exploration rate, and (iv) SMC-L does not utilize the energy of devices fully under poor channel conditions.