My expo app is crashing in prod, how do I troubleshoot it?

My expo app (managed workflow) starts to crash in prod after a while. I was never able to reproduce the crash in non prod while developing the app. If I reinstall the app on the same devise, it will work fine for some time, then will start to crash again. The way it crashes, it shows the black screen with a spinner for like 10 seconds, then locks the devise and asks for the pin code or fingerprint to unlock it.
My wild guess is that it may start happening when I exceed some limit on a number of system managed resources. For instance I’m storing tens of thousands of images in the cache folder. But this is just a theory I can’t prove or disprove. I just know that after the reinstall, or if I add some code to cleanup the cache folder, it will not crash. Cleaning up the cache folder is kind of defeats the purpose of the cache, and I always thoughts the cache folder should be managed by the OS.
Anyone has any suggestions where do I start with troubleshooting the issue?
Any help would be greatly appreciated.

2 Likes

I was able to confirm this case for iOS. For Android it seems to work correctly.

hi there! this is our advice for debugging production crashes: Debugging - Expo Documentation

you can also try expo run:ios to get the native project on your machine and build it with xcode, so you can get possibly more information. read about expo run:ios here

The problem is, that it takes a while for my app to start crashing on iOS (like about a month of active use) – as such I can not really reproduce it in non prod.
And once it starts crashing – the only way to fix is to push over the air update which cleans the cache folder – then it becomes stable again. This is why I’m thinking that my app does not have a leak – the leak is somewhere at a iOS level. Perhaps there is some dependency on number of files cached, perhaps expo filesystem library utilizes all available file handles which makes the app to run out of the file handles which results in crash. Of course, this is just a hypothesis, I would really appreciate if someone is curious enough about what’s going on here and wants to help to brainstorm this issue.

Are you saying that the device crashes and reboots? And that just clearing the cache gets it to start working again?

What happens if you write an app that just copies files over and over into the cache until you have a similar number of files in cache to the number in your real app when the crashing happens. Does that reproduce the problem?

Are you able to get any device logs when this happens? e.g. maybe one of these will help:

https://developer.apple.com/documentation/xcode/diagnosing-issues-using-crash-reports-and-device-logs

I’m not sure if it qualifies for a complete reboot – I do not see the apple logo, but yes, when the crash happens there is a white spinner over the black screen for like 5-10 seconds, and then it takes me to the iPhone “Touch ID or enter pass code” screen. Interestingly enough, I do not see any crash stats in the appstore connect.

Another interesting detail, when it starts to happen, it crashes when I push the home button to minimize the app, or when I double tap the home button then swipe the app up to close it. Not sure if it’s important.

The idea of writing app that dumps bunch of files into cache until it starts crashing is an interesting one – I’d have to think about it.
In my app I was not able to determine the exact number of files in the cache when it stars to become a problem. I tried to push over the air update that would clean cache folder leaving only certain number of files in the cache. I tried 8000, 5000, 2000, 1000. Once the issue starts to happen, it will continue until I clean up all the files, then it stays stable for a while, until it starts happening again. But the only way to reproduce it is disable the cleanup procedure and start accumulating files in cache for like a month a longer.
Unfortunately I just cleared the cache recently and my app is stable again, so I can’t use idevicesyslog. For now I’m disabling the cleaning proc and will be waiting for the crash to happen again – stay tuned. In a meanwhile I will still try to look into your advice to see if I can retrieve some older logs from my device. Thank you for you help!

1 Like

Just to follow up, the older logs do not reveal anything strange or suspicious.

It finally started to happen today again.
In one window I’m running “idevicesyslog > ios.log”
In another – “tail -f ios.log | grep -i error | grep -i echowaves” to filter out errors from my app.
Here is what I’m getting:

May 31 18:10:09 kernel(Sandbox)[0] <Error>: Sandbox: ComEchowaves(13046) deny(2) file-test-existence /private/etc/.mdns_debug
May 31 18:10:10 ComEchowaves(UIKitCore)[13046] <Error>: Unable to simultaneously satisfy constraints.
May 31 18:10:10 ComEchowaves(UIKitCore)[13046] <Error>: Unable to simultaneously satisfy constraints.
May 31 18:10:10 ComEchowaves(UIKitCore)[13046] <Error>: Unable to simultaneously satisfy constraints.
May 31 18:10:10 ComEchowaves(UIKitCore)[13046] <Error>: Unable to simultaneously satisfy constraints.
May 31 18:10:10 kernel(Sandbox)[0] <Error>: Sandbox: ComEchowaves(13046) deny(1) file-test-existence /private/var/Managed Preferences/mobile/com.apple.CoreMotion.plist
May 31 18:10:10 ComEchowaves(AppSSOCore)[13046] <Notice>: configurationWithCompletion: success = YES, error = (null)
May 31 18:10:10 ComEchowaves(CFNetwork)[13046] <Error>: BackgroundSession <7434DC88-2C15-4396-ABC2-13ED289A237B> connection to background transfer daemon invalidated
May 31 18:10:10 ComEchowaves(libnetwork.dylib)[13046] <Error>: nw_endpoint_handler_set_adaptive_read_handler [C1.1.1 IPv4#8562f670:443 ready channel-flow (satisfied (Path is satisfied), viable, interface: en0, ipv4, dns)] unregister notification for read_timeout failed
May 31 18:10:10 ComEchowaves(libnetwork.dylib)[13046] <Error>: nw_endpoint_handler_set_adaptive_write_handler [C1.1.1 IPv4#8562f670:443 ready channel-flow (satisfied (Path is satisfied), viable, interface: en0, ipv4, dns)] unregister notification for write_timeout failed
May 31 18:10:10 ComEchowaves(libnetwork.dylib)[13046] <Error>: nw_endpoint_handler_set_adaptive_read_handler [C2.1.1 IPv4#8562f670:443 ready channel-flow (satisfied (Path is satisfied), viable, interface: en0, ipv4, dns)] unregister notification for read_timeout failed
May 31 18:10:10 ComEchowaves(libnetwork.dylib)[13046] <Error>: nw_endpoint_handler_set_adaptive_write_handler [C2.1.1 IPv4#8562f670:443 ready channel-flow (satisfied (Path is satisfied), viable, interface: en0, ipv4, dns)] unregister notification for write_timeout failed
May 31 18:10:10 ComEchowaves(libsystem_networkextension.dylib)[13046] <Error>: nehelper sent invalid result code [1] for Wi-Fi information request
May 31 18:10:10 ComEchowaves(libsystem_networkextension.dylib)[13046] <Error>: nehelper sent invalid result code [1] for Wi-Fi information request
May 31 18:10:10 ComEchowaves(libsystem_networkextension.dylib)[13046] <Error>: nehelper sent invalid result code [1] for Wi-Fi information request
May 31 18:10:10 ComEchowaves(libsystem_networkextension.dylib)[13046] <Error>: nehelper sent invalid result code [1] for Wi-Fi information request
May 31 18:10:10 ComEchowaves(libnetwork.dylib)[13046] <Error>: nw_endpoint_handler_set_adaptive_read_handler [C3.1.1 IPv4#209c4f3f:443 ready channel-flow (satisfied (Path is satisfied), viable, interface: en0, ipv4, dns)] unregister notification for read_timeout failed
May 31 18:10:10 ComEchowaves(libnetwork.dylib)[13046] <Error>: nw_endpoint_handler_set_adaptive_write_handler [C3.1.1 IPv4#209c4f3f:443 ready channel-flow (satisfied (Path is satisfied), viable, interface: en0, ipv4, dns)] unregister notification for write_timeout failed
May 31 18:10:12 ComEchowaves(libnetwork.dylib)[13046] <Error>: nw_endpoint_handler_set_adaptive_read_handler [C4.1.1 IPv4#8562f670:443 ready channel-flow (satisfied (Path is satisfied), viable, interface: en0, ipv4, dns)] unregister notification for read_timeout failed
May 31 18:10:12 ComEchowaves(libnetwork.dylib)[13046] <Error>: nw_endpoint_handler_set_adaptive_write_handler [C4.1.1 IPv4#8562f670:443 ready channel-flow (satisfied (Path is satisfied), viable, interface: en0, ipv4, dns)] unregister notification for write_timeout failed
May 31 18:10:12 ComEchowaves(libnetwork.dylib)[13046] <Error>: nw_endpoint_handler_set_adaptive_read_handler [C5.1.1 IPv4#8562f670:443 ready channel-flow (satisfied (Path is satisfied), viable, interface: en0, ipv4, dns)] unregister notification for read_timeout failed
May 31 18:10:12 ComEchowaves(libnetwork.dylib)[13046] <Error>: nw_endpoint_handler_set_adaptive_write_handler [C5.1.1 IPv4#8562f670:443 ready channel-flow (satisfied (Path is satisfied), viable, interface: en0, ipv4, dns)] unregister notification for write_timeout failed
May 31 18:10:12 ComEchowaves(libnetwork.dylib)[13046] <Error>: nw_endpoint_handler_set_adaptive_read_handler [C6.1.1 IPv4#f83b933e:443 ready channel-flow (satisfied (Path is satisfied), viable, interface: en0, ipv4, dns)] unregister notification for read_timeout failed
May 31 18:10:12 ComEchowaves(libnetwork.dylib)[13046] <Error>: nw_endpoint_handler_set_adaptive_write_handler [C6.1.1 IPv4#f83b933e:443 ready channel-flow (satisfied (Path is satisfied), viable, interface: en0, ipv4, dns)] unregister notification for write_timeout failed

And right after the app crashes I see this in the the log:

May 31 18:10:27 runningboardd(RunningBoard)[32] <Error>: [application<com.echowaves>:13046] terminate_with_reason() failed with error: -1
May 31 18:10:27 runningboardd(RunningBoard)[32] <Error>: RBSStateCapture remove item called for untracked item <RBConnectionClient| 13046 name:application<com.echowaves> entitlements:<RBEntitlements| [

Not a clue what am I looking at. Any ideas, anyone?

Sometimes I see this:

May 31 19:04:58 ComEchowaves(AppSSOCore)[15805] <Error>: <SOServiceConnection: 0x282d487c0>: XPC connection interrupted
May 31 19:04:58 ComEchowaves(CFNetwork)[15805] <Error>: BackgroundSession <03B3BCED-79D2-488B-9326-ACB8EF300125> connection to background transfer daemon interrupted
May 31 19:04:58 ComEchowaves(CFNetwork)[15805] <Error>: BackgroundSession <06A7C8FF-2A80-4290-A386-60E582DA3F31> connection to background transfer daemon interrupted
May 31 19:04:58 ComEchowaves(CFNetwork)[15805] <Error>: BackgroundSession <06A7C8FF-2A80-4290-A386-60E582DA3F31> connection to background transfer daemon invalidated
May 31 19:04:58 ComEchowaves(CFNetwork)[15805] <Error>: BackgroundSession <03B3BCED-79D2-488B-9326-ACB8EF300125> connection to background transfer daemon invalidated

May 31 19:04:58 ComEchowaves(FontServices)[15805] <Error>: <private>

I may have solved the issue. The last log line gave me some clues
May 31 19:04:58 ComEchowaves(FontServices)[15805] <Error>: <private>
I noticed that in my app I’m using FontAwesome that comes from native-base rather than from expo. Once I switched to the expo’s FontAwesome and pushed over the air update – the app stopped crashing. Will still monitor it for some time. It’s hard for me to explain why something like FontAwesome would cause the crash, if anyone has any hypothesis to share, please let me know.

I may have spoken too soon, It’s still happening, in different cases and not as frequent. Still troubleshooting.

@dmitryame, would Sentry.io perhaps help with tracing this? There’s a nice detailed guide for integrating expo-sentry over here… I’ve only just started with it myself, so this is just a wild shot in the dark, but it might be worth a try. :man_shrugging:

Not sure if it will work for me, here is the note from the doc:

Note: Native crash reporting is not available with sentry-expo in the managed workflow.

Do you have docs or a code snippet of this change? I too am using native-base and FontAwesome5. I followed native-base docs of using IonIcons from expo when loading fonts and that is all I can see in regards to setting up native-base with expo in the documentation.

Ah, my apologies, I didn’t see that part…

My change didn’t really fix the issue, not sure if it’s worth sharing it here then. @jdobry are you running into crashes as well? You cal look in my code here: WiSaw/App.js at 91d390980529615b805ab065fa304e57412ce20e · echowaves/WiSaw · GitHub

So, my last change (using font awesome from expo instead of native-base) didn’t fix the issue,
I’m still getting the error in the device log during the crash:

Jun 4 12:00:28 ComEchowaves(FontServices)[432] <Error>: <private>

It seems there is some resources conflict between the expo and the iOS. The crash doesn’t get reported in the appStoreConnect, and, reproducing the crash while on a phone call – the call gets dropped, which does look like the device or iOS crash (not the app crash). But it only happens with my app, and it has happened on other devices too, so it does seem like the expo caused issue.

I don’t know how to proceed from here, any advice please?
I would really appreciate expo team chiming in here. Thank you!

I asked someone who knows more about this stuff than I do and he said the following:

oof thats a pretty hard one. I would try some options (in that order)

  • Try to get the actual crash log from a device
  • Try in production mode with Expo Go
  • Try expo run:ios to run an ejected version on a simulator/actual device
  • Try starting with xcode (on ejected state) in production

And to give a bit more context: :grinning_face_with_smiling_eyes:

  1. Try to get the actual crash log from a device
    This might point to a specific method that we can investigate further. Usually, it’s pretty hard to read or retrieve this log. (this might help)

  2. Try in production mode with Expo Go
    If you are “lucky enough”, you might be able to reproduce the error when running in production mode with Expo Go. (check out this)

  3. Try expo run:ios to run an ejected version on a simulator/actual device
    You might need to eject for this, but this might help avoid the whole build workflow. You also get more debugging options, like xcode logs and stuff like that.

  4. Try starting with xcode (on ejected state) in production
    This is a final attempt if the previous ones aren’t helping. But this should replicate the exact same environment. If you can’t find it here, I’m not sure where the search further.

1 Like

Still going through eliminating one potential cause at a time: spent time to re-factor my app from native-base to react-native-elements – something I wanted to do for a while anyways, and was suspecting there may be some weird awesome-fonts conflict (based on the error log I’m getting during the crash Jun 4 12:00:28 ComEchowaves(FontServices)[432] <Error>: <private> ). Ended up simplifying a lot of things (native-base component’s anatomy is needlessly complex) and the app overall looks better and the code is more maintainable.

Unfortunately this didn’t fix the issue. Of course I made sure I removed the native-base dependencies from package.json.

Now onto the next thing to try. Still suspecting it has something to do with the number of files I’m storing the filesystem cache, Will try to play with dumping cache files excessively and will see if I can re-produce the issue in non prod – stay tuned.

2 Likes