Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Letter
  • Published:

Glider soaring via reinforcement learning in the field

Abstract

Soaring birds often rely on ascending thermal plumes (thermals) in the atmosphere as they search for prey or migrate across large distances1,2,3,4. The landscape of convective currents is rugged and shifts on timescales of a few minutes as thermals constantly form, disintegrate or are transported away by the wind5,6. How soaring birds find and navigate thermals within this complex landscape is unknown. Reinforcement learning7 provides an appropriate framework in which to identify an effective navigational strategy as a sequence of decisions made in response to environmental cues. Here we use reinforcement learning to train a glider in the field to navigate atmospheric thermals autonomously. We equipped a glider of two-metre wingspan with a flight controller that precisely controlled the bank angle and pitch, modulating these at intervals with the aim of gaining as much lift as possible. A navigational strategy was determined solely from the glider’s pooled experiences, collected over several days in the field. The strategy relies on on-board methods to accurately estimate the local vertical wind accelerations and the roll-wise torques on the glider, which serve as navigational cues. We establish the validity of our learned flight policy through field experiments, numerical simulations and estimates of the noise in measurements caused by atmospheric turbulence. Our results highlight the role of vertical wind accelerations and roll-wise torques as effective mechanosensory cues for soaring birds and provide a navigational strategy that is directly applicable to the development of autonomous soaring vehicles.

This is a preview of subscription content, access via your institution

Access options

Rent or buy this article

Prices vary by article type

from$1.95

to$39.95

Prices may be subject to local taxes which are calculated during checkout

Fig. 1: Soaring in the field by using turbulent navigational cues.
Fig. 2: Convergence of the learning algorithm and the learned strategy for navigating thermal plumes.
Fig. 3: Performance of the learned strategy and its dependence on the wingspan.

Similar content being viewed by others

Data availability

The data that support the findings of this study are available from the corresponding author upon reasonable request.

References

  1. Newton, I. Migration Ecology of Soaring Birds 1st edn (Elsevier, Amsterdam, 2008).

    Google Scholar 

  2. Shamoun-Baranes, J., Leshem, Y., Yom-tov, Y. & Liechti, O. Differential use of thermal convection by soaring birds over central Israel. Condor 105, 208–218 (2003).

    Article  Google Scholar 

  3. Weimerskirch, H., Bishop, C., Jeanniard-du-Dot, T., Prudor, A. & Sachs, G. Frigate birds track atmospheric conditions over months-long transoceanic flights. Science 353, 74–78 (2016).

    Article  ADS  CAS  Google Scholar 

  4. Pennycuick, C. J. Thermal soaring compared in three dissimilar tropical bird species, Fregata magnificens, Pelecanus occidentals and Coragyps atratus. J. Exp. Biol. 102, 307–325 (1983).

    Google Scholar 

  5. Garrat, J. R. The Atmospheric Boundary Layer (Cambridge Univ. Press, Cambridge, 1994).

    Google Scholar 

  6. Lenschow, D. H. & Stephens, P. L. The role of thermals in the atmospheric boundary layer. Boundary-Layer Meteorol. 19, 509–532 (1980).

    Article  ADS  Google Scholar 

  7. Sutton, R. S. & Barto, A. G. Reinforcement Learning: An Introduction 1st edn (MIT Press, Cambridge, 1998).

    MATH  Google Scholar 

  8. Tesauro, G. Temporal difference learning and TD-Gammon. Commun. ACM 38, 58–68 (1995).

    Article  Google Scholar 

  9. Silver, D. et al. Mastering the game of Go without human knowledge. Nature 550, 354–359 (2017).

    Article  ADS  CAS  Google Scholar 

  10. Mnih, V. et al. Human-level control through deep reinforcement learning. Nature 518, 529–533 (2015).

    Article  ADS  CAS  Google Scholar 

  11. Kim, H. J., Jordan, M. I., Sastry, S. & Ng, A. in Advances in Neural Information Processing Systems Vol. 16 (eds Thrun, S. et al.) 799–806 (MIT Press, Cambridge, 2004).

  12. Levine, S., Finn, C., Darrell, T. & Abbeel, P. End-to-end training of deep visuomotor policies. J. Mach. Learn. Res. 17, 1–40 (2016).

    MathSciNet  MATH  Google Scholar 

  13. Allen, M. J. & Lin, V. Guidance and control of an autonomous soaring vehicle with flight test results. In 45th AIAA Aerospace Sciences Meeting and Exhibit 2007-867 (AIAA, 2007).

  14. Edwards, D. J. Implementation details and flight test results of an autonomous soaring controller. In AIAA Guidance, Navigation and Control Conference and Exhibit 2008-7244 (AIAA, 2008).

  15. Edwards, D. J. Autonomous Soaring: The Montague Cross Country Challenge. PhD thesis, North Carolina State Univ. (2010).

  16. Ákos, Z., Nagy, M., Leven, S. & Vicsek, T. Thermal soaring flight of birds and unmanned aerial vehicles. Bioinspir. Biomim. 5, 045003 (2010).

    Article  ADS  Google Scholar 

  17. Doncieux, S., Mouret, J. B. & Meyer, J.-A. Soaring behaviors in UAVs: ‘animat’ design methodology and current results. In 3rd US–European Competition and Workshop on Micro Air Vehicle Systems (MAV07) and European Micro Air Vehicle Conference and Flight Competition (EMAV2007) (2007); http://www.isir.upmc.fr/files/2007ACTI734.pdf.

  18. Wharington, J. & Herszberg, I. Control of a high endurance unmanned aerial vehicle. In 21st Congress of International Council of the Aeronautical Sciences 98-3.7.1 (ICAS, 1998).

  19. Chung, J. J., Lawrance, N. R. J. & Sukkarieh, S. Learning to soar: resource-constrained exploration in reinforcement learning. Int. J. Robot. Res. 34, 158–172 (2015).

    Article  Google Scholar 

  20. Reddy, G., Celani, A., Sejnowski, T. & Vergassola, M. Learning to soar in turbulent environments. Proc. Natl Acad. Sci. USA 113, E4877–E4884 (2016).

    Article  ADS  CAS  Google Scholar 

  21. Yeung, P. K. & Pope, S. B. Lagrangian statistics from direct numerical simulations of isotropic turbulence. J. Fluid Mech. 207, 531–586 (1989).

    Article  ADS  MathSciNet  Google Scholar 

  22. Voth, G. A., La Porta, A., Crawford, A. M., Alexander, J. & Bodenschatz, E. Measurement of particle accelerations in fully developed turbulence. J. Fluid Mech. 469, 121–160 (2002).

    Article  ADS  Google Scholar 

  23. Tennekes, H. & Lumley, J. L. A First Course in Turbulence (MIT Press, Cambridge, 1972).

    MATH  Google Scholar 

  24. Reichmann, H. Cross-Country Soaring (Thomson Publications, Santa Monica, 1988).

    Google Scholar 

  25. Ng, A. Y., Harada, D. & Russell, S. J. Policy invariance under reward transformations: theory and application to reward shaping. In Proc. 16th International Conference on Machine Learning (eds Bratko, I. & Dzeroski, S.) 278–287 (Morgan Kaufmann, San Francisco, 1999).

  26. MacCready, P. B. J. Optimum airspeed selector. Soaring 1958, 10–11 (1958).

    Google Scholar 

  27. Horvitz, N. et al. The gliding speed of migrating birds: slow and safe or fast and risky? Ecol. Lett. 17, 670–679 (2014).

    Article  Google Scholar 

  28. Cochrane, J. H. MacCready theory with uncertain lift and limited altitude. Tech. Soaring 23, 88–96 (1999).

    Google Scholar 

  29. Frisch, U. Turbulence: The Legacy of A. N. Kolmogorov (Cambridge Univ. Press, Cambridge, 1995).

    Book  Google Scholar 

Download references

Acknowledgements

This work was supported by Simons Foundation grant 340106 (to M.V.) and NSF grant NCS-FO-1735004 (to T.J.S.).

Reviewer information

Nature thanks M. Chertkov and the other anonymous reviewer(s) for their contribution to the peer review of this work.

Author information

Authors and Affiliations

Authors

Contributions

All authors were involved in designing the study and drafting the final manuscript. G.R. and J.W.N. performed the experiments and analysed the data. G.R., A.C. and M.V. contributed to the theoretical results.

Corresponding author

Correspondence to Massimo Vergassola.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note: Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data figures and tables

Extended Data Fig. 1 Sample trajectories obtained in the field.

The three-dimensional view and top view are shown of the glider’s trajectory as it executes the learned strategy for thermals (labelled ‘s’) or a random policy that takes actions with equal probability (labelled ‘r’). The trajectories are coloured according to the instantaneous vertical ground velocity uz. The green (red) dot shows the start (end) point of the trajectory. Trajectories s1, s2 and r1 last for 3 min each, whereas s3 lasts for about 8 min.

Extended Data Fig. 2 Force–body diagram of a glider.

The forces on a glider and the definitions of the various angles that determine the glider’s motion.

Extended Data Fig. 3 Modelling the longitudinal motion of the glider.

a, Sample trajectory of a glider’s pitch and its vertical velocity with respect to ground (uz) in a case in which the feedback control over the pitch is reduced in order to exaggerate the pitch oscillations. The blue line shows the measured uz, and the orange line is uz obtained after subtracting the contributions from longitudinal motions of the glider (see Supplementary Information). b, The blue line shows the average change in uz when a particular action is taken (labelled above each panel), averaged over n 3-s intervals. The 13 panels correspond to the 13 possible bank angle changes from the angles 0°, ±15° and ±30° by increasing, decreasing the bank angle by 15° or keeping the same angle. The green dashed line shows the prediction from the model whereas the orange line is the estimated wz. The axis on the right shows the averaged pitch (red dashed line).

Extended Data Fig. 4 The estimated vertical wind acceleration is unbiased after accounting for the glider’s longitudinal motion.

a, The averaged vertical wind acceleration az in units of its standard deviation. az, plotted as in Extended Data Fig. 3b, is shown in orange with (blue line) and without (orange line) accounting for the glider’s longitudinal motions. The axis on the right shows the airspeed (green dashed line). b, Probability density functions (PDFs) of az for the different bank angle changes. The black dashed line shows the median.

Extended Data Fig. 5 The estimated roll-wise torque is unbiased after accounting for the effects of feedback control and glider aerodynamics.

a, The averaged evolution of the bank angle shown as in Extended Data Fig. 3b. The blue line shows the measured bank angle and the dashed orange line shows the best-fit line obtained from simultaneously fitting the 13 blue curves to the prediction (see Supplementary Information). b, PDFs of the roll-wise torque ω (in units of its standard deviation) for the different bank angle changes. The black dashed line shows the median value.

Extended Data Fig. 6 The distribution of the strength of vertical currents observed in the field.

The root-mean-square vertical wind velocity measured in the field is pooled from about 240 3-min trials collected over 9 days. The dashed red line shows the threshold criterion imposed when measuring the performance of the strategy in the field (see Methods).

Extended Data Table 1 Parameter values

Supplementary information

Supplementary Information

This file contains: (1) on-board estimation of the navigational cues; (2) reward shaping and policy invariance; and (3) noisy gradient sensing in the turbulent atmospheric boundary layer.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Reddy, G., Wong-Ng, J., Celani, A. et al. Glider soaring via reinforcement learning in the field. Nature 562, 236–239 (2018). https://doi.org/10.1038/s41586-018-0533-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/s41586-018-0533-0

Keywords

This article is cited by

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

Get the most important science stories of the day, free in your inbox. Sign up for Nature Briefing