Duplicate Push Notification: client receives dozens of duplicates.

Hi, thank you for all the awesome work! Just want to bring to you guys’ attention this push notification issue I ran into, and see if I can get any feedback on how to proceed.

I noticed that sometimes when push notifications are sent to: https://exp.host/--/api/v2/push/send
I get back 502 bad gateway error, and around the same time, the client could receive the same push notification anywhere from 3 - 30 times.

Currently, our server does not retry, so I’m wondering if somehow the logic on expo’s side is retrying these messages a lot of times?

It does not happen very often, but when it does happen in prod it’s pretty bad. :man_facepalming:

Hi @jakeyang! Any chance you’ve found a way to reliably cause duplicate notifications? that would definitely help in debugging this and finding out why it’s happening.

Are you using on of the Expo server SDKs to send push notifications?

Hi, thank you for the quick response.
We ourselves have been chasing after this issue for quite a while. For the past few months, we were only getting reports from our users, and screenshots of the duplicate messages until a few days ago it hit our admins’ accounts simultaneously.
Here is what we know so far:

  1. We are not using the expo server SDK, wrote our own in Golang with Go’s HTTP package, it’s just a plain post request without retry (for errors, we log it and move on). The requests sent to expo server seem to be processed multiple times.
  2. Previously sending about 2~3k pushes a day, send every second (skip if there is none waiting in queue), and throttled to 10 concurrent requests maximum.
  3. Throttled it to a maximum of 2 concurrent requests yesterday, still got quite a few 502 (not sure if any duplicates happened, been trying to build more detailed e2e monitoring from the client-side)
  4. We do check with Expo server using the push receipt about 30s later, to disable push tokens that had been deactivated, and we stop sending to them.
  5. It tends to happen during our peak time (around 2pm)
  6. It seems to happen when we send more than 4,000 push notifications a day (potentially 2-3 concurrent requests to Expo server temporarily)
  7. It seems (still trying to confirm) that on the days when this happens, we get a lot of this error from expo endpoint:
    <html>\r\n<head><title>502 Bad Gateway</title></head>\r\n<body>\r\n<center><h1>502 Bad Gateway</h1></center>\r\n<hr><center>nginx/1.17.8</center>\r\n</body>\r\n</html>
  8. I’m not 100% sure if this 502 is the problem, but it could be.

Here is our plan:
We thought about building some e2e test: have the server send push to a test device every second with a timestamp, the test device will see if it receives any duplicate timestamp, and report the problem to the server. The server will also log the time of these 502 error to see if they happen around the same time.
I can report back the result in a few days.

In the mean time, I think if you have a script to send > 5k messages with 24 hours period, and maybe send 2-3 concurrent requests around 2pm (not sure if the bottleneck is with the push server overall, or has to do with volume per account), you’ll be able to log some of these errors.

My guess is this might have something to do with the 502? I wonder if the Nginx server is configured to do some sort of retry?

I got same problem. I am using SDK 36 and expo-server-sdk-python. Any help :frowning: ?

I was trying to collect more data but it was impacting prod users so we couldn’t afford to wait, we ended up having to migrate most of the users off to APNS and FCM directly.
For anyone who’s struggling with this issue, check out this API:

https://docs.expo.io/versions/latest/sdk/notifications/#notificationsgetdevicepushtokenasyncconfig

You can get the underlying APNS and FCM token, sending to APNS and FCM is just as easy as routing through Expo’s push server.

Also, we initially thought it was because we were sending too many push notifications through Expo’s server (3-5k a day), after the migration, we send about 1-2k a day through Expo and we were still logging quite a few 502 errors, haven’t built out the e2e to monitor if clients were receiving duplicate push notifications.

thanks for the breakdown jake! we’re going to look into this, but I’ve created an issue for us to keep the conversation in just one place - https://github.com/expo/expo/issues/8250

Hopefully as more information comes to light we’ll be able to identify what’s going on

1 Like