Souffl | RATP : Design d'une interface vocale

– Urbanopolis.
This is the RATP Group’s network of innovation sites. Its aim is to share ideas to promote the advancement of the city and its residents.

– Passenger and Digital Information Department.
An integrated structure with RATP, which develops customized mobile applications containing embedded static information or dynamic information when connected to the network.

– Snips (now Sonos Voice Experience).
As the provider of voice recognition technology, it offered a technical solution with two key advantages:

– genuine respect for users’ privacy (privacy by design); the voice recognition is performed by the device itself and no user data is sent to any server,
– real quality in the recognition of user queries, with the use of artificial intelligence (deep learning).

Developing ideas today for the city of tomorrow.

Raising awareness of new uses.

Improving the customer experience is one of the RATP Group’s strategic priorities. We were approached to assist in the development of an intelligent, autonomous voice assistant capable of providing passengers with traffic status and route schedules in real time, on RATP’s metro, tramway, and RER networks.

The goal was partly to educate employees and partners on these new uses (B2B), as well as to explore new services for the network’s customers (B2C).

An interactive voice experience.

Beyond the technical challenges involved in developing such a system, the main difficulty was in optimizing the interactive voice experience in terms of relevance and quality. Once optimized, the user will accept it more readily.
To achieve this goal, dialogue design played a pivotal role.

A guarantee of effectiveness, this assistant spawned from a collaboration between RATP and Souffl. The perfect blend of industry expertise, user knowledge, and practical skills in design and technology.

We learned together, overcame technical difficulties, cleared production hurdles, and successfully created a product that both parties are proud of.

Hippolyte Thiriet, Product Owner at RATP

Designing dialogue.

Determin the scope.

Before we could design the dialogue’s structure, we had to determine its scope.
In this case, it was a matter of knowing what topics would be mentioned, and thus determining what questions our assistant would have to answer.

At the start of the project, it was decided to focus on two types of requests:
– station schedules and stops
– the status of traffic on the network

We then needed to clarify this scope by listing practical use scenarios. For example:
– The user wants to know when the next train will leave a given station, possibly on a specific line going in a specific direction.
– The user wants to know when the next train will leave a certain station to arrive at another.
– The user wants the overall traffic status for a specific network or line.
– And so on.

All of these steps established a framework conducive to designing the dialogue. It set the types of questions to be expected as well as the types of answers to be given.

Understanding the user intent.

Training the AI.

Harnessed properly, Snips technology can be used to recognize a user’s intent from precise contexts and with high reliability, while giving users freedom in terms of the vocabulary they choose.

To achieve this, you have to train the AI using a series of standard sentences a user might say.

For example:
“How much longer till the next RER to Auber?”

“When is the next Villiers metro for Opéra?”

“What time is the next RER A to Nation?”

And so on.

While the examples above are quite representative, the AI trained in this way will be able to understand what is asked, no matter how the user phrases the question.

General approach.

To build a dialogue management algorithm that favors a clear and seamless exchange, it is necessary to get a precise idea of the information the user will need to provide.
If the user’s request is not precise enough, the system has to quickly follow up with a clear and targeted question. The goal here is to acquire enough detail to be able to give a satisfactory answer.

We do not necessarily need all the information, but there is a minimum threshold of relevant data. To determine this, we need to go through various pre-defined scenarios, which are selected because they correspond to an actual need and can be effectively satisfied.

In practice, if a user wants to know when their train will arrive, they will want to know this at a given station. However, if we want to avoid overwhelming the user with answers they do not need, a little more information is required.
This would include what line or direction they want to take and/or what station they want to get off at. In other words, any extra information that clarifies their need and can be used to offer the most pertinent reply possible. Of course, this need for precision depends on the station.

If a user wants to know when the next train will reach Alexandre Dumas, this is simple — the station has only one line and two directions. So they can be given a direct reply without needing further details. If, however, they make the same query about Nation, it gets much more complicated. There are four metro lines and one RER, i.e. 12 possible directions to give, which is far too many. In this case, the user must be asked to provide the desired line, direction, or destination. With sufficient information, they can be offered the answer they want.

A list of all the various scenarios has to be drawn up, and in each case, we have to plan out the response we want in order to deliver the best possible experience.

In practice.

This involves:

clearly defining the user’s expectations (they may just want the upcoming schedule for a station, or the schedule to go from one station to another, or on a specific line, etc.).

clearly defining what information is expected from the user .

clearly imagining various potential cases (and what to do if the user provides partial information); and managing them, of course. It is also necessary to set the level of precision needed so that the user is asked for enough information in order to be given a useful reply. We could have settled for a small amount of information and given the user every possible answer, but then the reply would turn into an endless laundry list and be completely inefficient.

Presenting information to the user.

The first step consists of formulating the optimal replies.

This involves working on terminology: what specific vocabulary must be used?
Words and expressions must:
– be consistent with terminology in use at RATP, which is the language users are familiar with
– leave no room for misinterpretation; precision is key
– be immediately understood by users of the solution

For this part, we relied on the expertise of the Passenger Information department.

To make the system even more effective, we needed to work on how information was presented: its structure, the order it would be given in, and its hierarchy and quantity.

To formulate an unambiguous reply, we decided to go from most general to most specific.
Take the example of a user who wants to know the next few trains stopping at Villiers station:
— at Villiers station => the system reiterates to the user what station their query is about. This provides context and removes the uncertainty of a misunderstanding on the assistant’s part.

— metro line 2: The line is stated. In fact, a station can be serviced by several lines, so schedules are listed for each line. This also gets rid of any potential ambiguity. Again with the aim of leaving no room for doubt, the assistant reminds the user this is for the metro network.

— to Porte Dauphine: The direction is stated.

— The next train will arrive in 2 minutes, the one after that in 7 minutes: provides the specific information requested.

— Then other directions are given.

— Finally, it moves on to line 3 following the same pattern.

Delivering information in a hierarchical fashion, from most general to most specific — and the fact that this hierarchy is logical, obvious, and predictable — allows the user to ignore the information they do not need in a completely instinctive way.
If the assistant were to first give information on a direction the user doesn’t care about, they would instinctively put themselves “on hold” to wait for the name of the destination they want. Hearing this direction will automatically call their attention.

The design.

Creating a support in harmony with the experience.

With this project, in addition to creating the dialogue experience, we also wanted to produce the canvas that delivers this experience. While the chief goal in designing this item was to make it as harmonious with the experience as possible, it was also subject to more specific goals.

Namely, technical objectives, such as achieving good sound quality and playback.

Also budget objectives, i.e. keeping costs and production schedules within reason.

Electronic design.

For the electronic part, and considering we had to produce a small series of prototypes, our foundation was a Raspberry Pi base, which we supplemented with quality off-the-shelf components whose properties we knew well.

This allowed us to control the costs, developments, and quality of the final product.

Item design.

The item design had to reflect the RATP and Urbanopolis identities while conveying an “experimental” dimension, as well as being feasible at a reasonable, controlled cost and timetable.

We chose a simple shape — a cube — and opted for wood as the material. The panels were laser cut and engraved, which made it possible to quickly produce prototypes as well as effectively showcase the product’s identity.

In the end.

RATP LAB.

Five prototypes were produced and supplied to RATP for their testing and labs.
They are currently in use in various departments and labs at RATP.

Designing a Voice Interface for the RATP Network Passengers.