Submission log failures without an error message
Incident Report for Crossref
Postmortem

Release v0.2.0 of the Content System contained a bug that didn't handle component DOIs. As a result, any deposit that contained component DOIs failed to process. The message for the affected submission includes the entry <record_diagnostic status="Failure"> and the stack trace includes "NullPointerException". The full stack trace is included further down.

We identified the issue and rolled back to the previous release v0.1.2, which completed on Wednesday 2019-08-21T14:27 UTC. At this point new submissions were no longer subject to the bug.

On Thursday 2019-08-21T12:00 we began to reprocess the 4593 submissions that we believe failed for this reason. This may have the following effects:

  • The submission may be successfully re-processed.
  • If the member made a newer deposit with the same data with a newer timestamp, our re-process may fail with the message that the timestamp is incorrect. This is a fail-safe, and expected.
  • If the deposit contained another error, that error may be raised.

The reprocessing has been completed.

Affected submissions would have had the following message (which can be viewed in the Admin system):

java.lang.NullPointerException at org.crossref.ds.metadata.AbstractMetadataProcessor.validateDOIs(AbstractMetadataProcessor.java:39) at org.crossref.ds.metadata.MetadataProcessorImpl.processMetadata(MetadataProcessorImpl.java:265) at org.crossref.ds.submissionprocessor.DataSubmissionProcessorThread.processSubmission(DataSubmissionProcessorThread.java:109) at org.crossref.ds.submissionprocessor.DataSubmissionProcessorThread.execute(DataSubmissionProcessorThread.java:184) at org.crossref.qs.mbeanthread.MBeanThread.run(MBeanThread.java:84) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745)

Posted 28 days ago. Aug 22, 2019 - 17:37 UTC

Resolved
We have re-processed the 5000 failed submissions from this incident.

Most submissions should have been successfully re-processed and you will have received a submission email confirming this. However, if you made a deposit with the same data and a newer time stamp, you will have received a submission email with a message that the timestamp is incorrect. You can ignore this as your newer submission will still be there.

If your deposit contained an entirely separate error, you will have received a submission email with the relevant error message - please deal with this in your usual way.
Posted 28 days ago. Aug 22, 2019 - 16:36 UTC
Update
We're about to start re-processing the 5000 failed submissions from this incident.

Most submissions should be successfully re-processed and you'll receive a submission email confirming this. However, if you've made a deposit with the same data and a newer time stamp in the meantime, you'll receive a submission email with a message that the timestamp is incorrect. You can ignore this as your newer submission will still be there.
If your deposit contained an entirely separate error, you'll receive a submission email with the relevant error message - please deal with this in your usual way.
Posted 28 days ago. Aug 22, 2019 - 11:38 UTC
Update
We have identified the nearly 5000 failed submissions that were potentially affected in this incident. We will reprocess all of these failed submissions tomorrow and update you once we have done so.
Posted 29 days ago. Aug 21, 2019 - 19:47 UTC
Update
We have rolled back yesterday's deployment and are now working on reprocessing all submissions that failed between yesterday's deployment at 1500 UTC and our rollback.
Posted 29 days ago. Aug 21, 2019 - 15:05 UTC
Identified
Since 1500 UTC yesterday, some content registrations have been failing and not providing an error message to identify the issue.

We think this problem is related to our latest deployment yesterday. We're currently rolling back the deployment and will update as soon as we have more information.
Posted 29 days ago. Aug 21, 2019 - 14:05 UTC
This incident affected: Content Registration (Deposit system).