Sessionizing Uber Trips in Real Time
Uber’s many data flows required modeling the data associated with a specific task, such as a rider trip, into a state machine. The state machine lets engineers focus on just the events needed to successfully accomplish a trip. In one sense, Uber’s challenge of efficiently matching riders and drivers in the real world comes down to the question of how to collect, store, and logically arrange data.
Our efforts to ensure low wait times by predicting rider demand, while simultaneously enabling drivers to use the platform as effectively as possible by taking into account traffic and other factors, only magnifies the scope of data involved. To better focus how we manage the massive amounts of real-time data over multiple systems that make up the Uber Marketplace, we developed the Rider Session State Machine, a methodology that models the flow of all the data events that make up a single trip. We refer to the data underlying each trip as a session, which begins when a user opens the Uber app.
That action triggers a string of data events, from when the rider actually requests a ride to the point where the trip has been completed. As each session occurs within a finite period of time, we can more easily organize the relevant data to be used for future analysis to further enhance our services. Among other functions, categorizing Uber’s trip data into sessions makes it easier to understand and uncover issues or introduce new features.