×By the end of May 2018, the ticketing system at https://ticket.opensciencegrid.org will be retired and support will be provided at https://support.opensciencegrid.org. Throughout this transition the support email ([email protected]) will be available as a point of contact.

Please see the service migration page for details: https://opensciencegrid.github.io/technology/policy/service-migrations-spring-2018/#ticket
Contact
David Urbina
Details
CMS
GGUS
David Urbina
USCMS
Elevated
Resolved
TX Incoming
2017-07-11
Assignees
FNAL (Prod) INC000000865799 / Ticket Exchange
GGUS (Prod) 129300 / Ticket Exchange

Assignees TODO
Past Updates
by OSG-GOC 
[Closed/Resolved by Caller]
Closed by Caller
-- by  at Sat Jul 15 05:00:19 UTC 2017

GOCTX Source: https://fermi.service-now.com/nav_to.do?uri=incident.do?sys_id=INC000000865799
by OSG-GOC 
This ticket was closed automatically. If this issue has not been resolved according your expectations or if you need further information, please submit a new ticket referencing this one.

-- by GGUS SYSTEM at Wed Jul 26 23:59:00 UTC 2017

GOCTX Source: https://ggus.eu/ws/ticket_info.php?ticket=129300
by OSG-GOC 
Thanks Chih-Hao.

Transfers quality has been good for the last 18hrs. I'm closing the ticket.

David Urbina
Transfer Team

-- by David Urbina at Tue Jul 11 17:41:13 UTC 2017

GOCTX Source: https://ggus.eu/ws/ticket_info.php?ticket=129300
by OSG-GOC 
Hi Huangch,
In deed, it seems that the network performance has improved [1].
We'll keep an eye on progress for transfers coming out of FNAL.

Sebastian.

[1] https://monit-kibana.cern.ch/app/kibana#/discover?_g=(refreshInterval:(display:Off,pause:!f,value:0),time:(from:now-7d,mode:quick,to:now))&_a=(columns:!(data.dst_url,data.t__error_message,data.log_link),filters:!((%27$state%27:(store:appState),meta:(alias:!n,disabled:!f,index:%27monit_prod_fts_enr_complete_*%27,key:data.vo,negate:!f,value:cms),query:(match:(data.vo:(query:cms,type:phrase))))),index:%27monit_prod_fts_enr_complete_*%27,interval:auto,query:(query_string:(analyze_wildcard:!t,lowercase_expanded_terms:!f,query:%27data.src_experiment_site:%20T1_US_FNAL%27)),sort:!(metadata.event_timestamp,desc),vis:(aggs:!((params:(field:data.vo,orderBy:%272%27,size:20),schema:segment,type:terms),(id:%272%27,schema:metric,type:count)),type:histogram))&indexPattern=monit_prod_fts_enr_complete_*&type=histogram

-- by Sebastian Pulido at Tue Jul 11 07:58:55 UTC 2017

GOCTX Source: https://ggus.eu/ws/ticket_info.php?ticket=129300
by OSG-GOC 
These files are on one store node: cmsstor614.fnal.gov.
We do find its network performance was poor.
The network issue has been resolved.
The transfers should be much faster now.
Let us know if the problem is resolved.
-- by huangch at Mon Jul 10 23:09:02 UTC 2017

GOCTX Source: https://fermi.service-now.com/nav_to.do?uri=incident.do?sys_id=INC000000865799
by OSG-GOC 
Additionally, I see that most of the files related to last errors are particularly large or at least above average [1]. I don't think this is the reason why the timeouts are happening but I wanted to provide you with this information.

David Urbina
Transfer Team

[1]
to T1_FR_CCIN2P3_Disk
/store/data/Run2017A/MinimumBias/RECO/PromptReco-v1/000/295/134/00000/7CB9FB1C-0B45-E711-BA80-02163E014417.root => 10.49GB
/store/data/Run2017A/MinimumBias/RECO/PromptReco-v1/000/294/952/00000/7E70B9F0-C044-E711-87F8-02163E01A2E4.root => 10.99GB
/store/data/Run2017A/MinimumBias/RECO/PromptReco-v1/000/294/987/00000/9EC90E5E-CB44-E711-A8C0-02163E014734.root => 9.03GB
/store/data/Run2017A/MinimumBias/RECO/PromptReco-v1/000/295/128/00000/3888BBC4-FA44-E711-893C-02163E012662.root => 3.65GB
/store/data/Run2017A/MinimumBias/RECO/PromptReco-v1/000/294/989/00000/26707D23-CD44-E711-9AE1-02163E01A289.root => 15.75GB
/store/data/Run2017A/MinimumBias/RECO/PromptReco-v1/000/294/993/00000/AAE58B0D-D144-E711-987D-02163E01A21A.root => 11.40GB
/store/data/Run2017A/MinimumBias/RECO/PromptReco-v1/000/295/128/00000/DAE5D4BA-FA44-E711-9758-02163E0136BC.root => 7.92GB

to T1_UK_RAL_Buffer
/store/mc/RunIISummer16MiniAODv2/ZprimeToTprimeT_TprimeToWB_Zp3500WID_Tp1800Nar_LH_TuneCUETP8M2T4_13TeV-madgraph-pythia8/MINIAODSIM/PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/110000/E0E247BF-D95E-E711-9ABC-0025905B8564.root => 1.54GB
/store/mc/PhaseIFall16DR/SingleNeutrino/AODSIM/FlatPU28to62HcalNZSRAW_90X_upgrade2017_realistic_v6_C1-v1/90001/CEDFE736-820E-E711-84A7-0CC47A4C8E0E.root => 4.15GB
/store/data/Run2016D/SingleElectron/AOD/18Apr2017-v1/120001/C2114CF2-8B37-E711-A81C-0025905B855A.root => 3.99GB
/store/mc/PhaseIFall16DR/SingleNeutrino/AODSIM/FlatPU28to62HcalNZSRAW_90X_upgrade2017_realistic_v6_C1-v1/90002/BC1608F9-0B0F-E711-BED0-0CC47A4C8E2E.root => 4.14GB
/store/data/Run2016D/SingleElectron/AOD/18Apr2017-v1/120003/1A624928-F137-E711-9394-0025905A60B0.root => 2.69GB
/store/mc/PhaseIFall16DR/SingleNeutrino/AODSIM/FlatPU28to62HcalNZSRAW_90X_upgrade2017_realistic_v6_C1-v1/90001/26CA1E47-800E-E711-B5A4-0CC47A4C8E8A.root => 4.14GB
/store/data/Run2016D/SingleElectron/AOD/18Apr2017-v1/120003/341F56D0-BF37-E711-A821-0CC47A4D76C0.root => 4.09GB
/store/data/Run2016D/SingleElectron/AOD/18Apr2017-v1/120002/8EE697F1-7737-E711-8B60-003048FFCBB2.root => 4.06GB
/store/mc/PhaseIFall16DR/SingleNeutrino/AODSIM/FlatPU28to62HcalNZSRAW_90X_upgrade2017_realistic_v6_C1-v1/90003/B4E6CAFE-BA10-E711-BB0B-0025905B855A.root => 4.15GB
/store/mc/PhaseIFall16DR/SingleNeutrino/AODSIM/FlatPU28to62HcalNZSRAW_90X_upgrade2017_realistic_v6_C1-v1/130000/D05873BA-AC18-E711-89E2-0CC47A78A41C.root => 4.15GB

to T1_ES_PIC_Buffer
/store/data/Run2016E/DoubleMuon/AOD/18Apr2017-v1/120000/F4F12561-652B-E711-8EEE-0CC47A4D75F8.root => 3.41GB
/store/data/Run2016E/DoubleMuon/AOD/18Apr2017-v1/60001/10613115-6829-E711-984B-0CC47A4D7662.root => 3.65GB
/store/data/Run2016E/DoubleMuon/AOD/18Apr2017-v1/60001/6C2A3A6B-8429-E711-9F53-0025905A497A.root => 4.17GB
/store/mc/CosmicFall16PhaseIDR/TKCosmics_0T/GEN-SIM-RECO/DECO_B0T_90X_upgrade2017cosmics_realistic_deco_v18-v1/00000/98CCCD08-9122-E711-8C4A-0CC47A4D7664.root => 4.69GB
/store/mc/CosmicFall16PhaseIDR/TKCosmics_0T/GEN-SIM-RECO/DECO_B0T_90X_upgrade2017cosmics_realistic_deco_v18-v1/00000/24824C58-A922-E711-A88E-0025905B855E.root => 4.70GB
/store/data/Run2016E/DoubleMuon/AOD/18Apr2017-v1/60001/84912498-6B29-E711-BC53-0CC47A4C8E96.root => 3.88GB
/store/mc/CosmicFall16PhaseIDR/TKCosmics_0T/GEN-SIM-RECO/DECO_B0T_90X_upgrade2017cosmics_realistic_deco_v18-v1/80000/DA18763A-8D22-E711-9D29-0CC47A4D7692.root => 4.72GB
by OSG-GOC 
Thanks for checking. I do think that there is a problem. I want to provide you with more examples of failures that had been happening the last couple of hours. The file we used above as example was successfully transfer. But transfer quality to T1_UK_RAL_Buffer, T1_FR_CCIN2P3_Disk, T1_ES_PIC_Buffer is still bad [1].

I attach the list of files (PFNs) that are related to the last 100 errors per link.  All of them reporting: "TRANSFER TRANSFER  Operation timed out".

I had test transferring some to lxplus using gfal-copy and I am getting rates bellow 1MB/s with one exception. File [2] I had been able to transfer it at a speed of ~65MB/s. However, even for that case if I check the logs of the corresponding FTS3 job [3], the avg rate has been less than 1MB/s.

[1]https://cmsweb.cern.ch/phedex/prod/Activity:;:QualityPlots?graph=quality_all&entity=link&src_filter=T1_US_FNAL&dest_filter=T0%7CT1%7CT2_CH_CERN&no_mss=true&period=l72h&upto=

[2] srm://cmsdcadisk01.fnal.gov:8443/srm/managerv2?SFN=/dcache/uscmsdisk/store/mc/RunIISummer16MiniAODv2/ZprimeToTprimeT_TprimeToWB_Zp3500WID_Tp1800Nar_LH_TuneCUETP8M2T4_13TeV-madgraph-pythia8/MINIAODSIM/PUMoriond17_80X_mcRun2_asymptotic_2016_TrancheIV_v6-v1/110000/E0E247BF-D95E-E711-9ABC-0025905B8564.root

[3] https://lcgfts06.gridpp.rl.ac.uk:8449/var/log/fts3/2017-07-05/cmsdcadisk01.fnal.gov__srm-cms.gridpp.rl.ac.uk/2017-07-05-1810__cmsdcadisk01.fnal.gov__srm-cms.gridpp.rl.ac.uk__558413313__954817c2-61a3-11e7-a54a-001dd8b71cf7

David Urbina
Transfer Team

-- by David Urbina at Thu Jul 06 07:43:13 UTC 2017

Added attachment files_relatedtimeouts_fromFNAL.txt
https://ggus.eu/index.php?mode=download&attid=ATT107710
;

-- by David Urbina at Thu Jul 06 07:43:11 UTC 2017

GOCTX Source: https://ggus.eu/ws/ticket_info.php?ticket=129300
by OSG-GOC 
We did an srmcp locally (on cmslpc27.fnal.gov) for the file in question, we got 63MB/s transfer rate, which seems pretty normal.

bash-4.1$ srmcp -2 -debug "srm://cmsdcadisk01.fnal.gov:8443/srm/managerv2?SFN=/dcache/uscmsdisk/store/data/Run2016H/Tau/MINIAOD/18Apr2017-v1/10000/E820E87A-7F53-E711-8DA7-0025905A60F4.root"; file:////dev/null
...

and xrdcp shows similar rate:

bash-4.1$ xrdcp -f root://cmsxrootd.fnal.gov//store/data/Run2016H/Tau/MINIAOD/18Apr2017-v1/10000/E820E87A-7F53-E711-8DA7-0025905A60F4.root /dev/null
[2.034GB/2.034GB][100%][==================================================][67.2MB/s]
bash-4.1$

In billing database:

billing=> select * from billinginfo where pnfsid = '00009E7795AB5A68466F90019BA6B0D23F48' and datestamp > '2017-07-03' and client = '131.225.190.179';
client      |                         initiator                         | isnew |  protocol  | transfersize |  fullsize  |   storageclass    | connectiontime |  action  |        cellname         |         datestamp
| errorcode | errormessage |                pnfsid                |                    transaction                    | p2p
-----------------+-----------------------------------------------------------+-------+------------+--------------+------------+-------------------+----------------+----------+-------------------------+----------------------
------+-----------+--------------+--------------------------------------+---------------------------------------------------+-----
131.225.190.179 | door:GFTP-cmsstor352-AAVTbGvaGPA:1499100015349000         | f     | GFtp-2.0   |   2184489359 | 2184489359 | cms.cms11@enstore |          32949 | transfer | w-cmsstor614-disk-disk3 | 2017-07-03 11:40:48.3
13-05 |         0 |              | 00009E7795AB5A68466F90019BA6B0D23F48 | pool:w-cmsstor614-disk-disk3:1499100048313-997504 | f
131.225.190.179 | door:XrootdLFNs-cmsdcadisk01:AAVTbHYIMRg:1499100185504000 | f     | Xrootd-2.7 |   2184489359 | 2184489359 | cms.cms11@enstore |          35060 | transfer | w-cmsstor614-disk-disk3 | 2017-07-03 11:43:40.5
79-05 |         0 |              | 00009E7795AB5A68466F90019BA6B0D23F48 | pool:w-cmsstor614-disk-disk3:1499100220579-997511 | f
131.225.190.179 | door:XrootdLFNs-cmsdcadisk01:AAVTbNAnUCA:1499101697500000 | f     | Xrootd-2.7 |   2184489359 | 2184489359 | cms.cms11@enstore |          31244 | transfer | w-cmsstor614-disk-disk3 | 2017-07-03 12:08:48.7
61-05 |         0 |              | 00009E7795AB5A68466F90019BA6B0D23F48 | pool:w-cmsstor614-disk-disk3:1499101728761-997581 | f
(3 rows)

billing=>

We do not see a problem here.
-- by huangch at Mon Jul 03 17:11:26 UTC 2017

GOCTX Source: https://fermi.service-now.com/nav_to.do?uri=incident.do?sys_id=INC000000865799
by OSG-GOC 
Reassigning to DMS
-- by dmason at Mon Jul 03 15:55:44 UTC 2017

GOCTX Source: https://fermi.service-now.com/nav_to.do?uri=incident.do?sys_id=INC000000865799
by OSG-GOC 
Hello,

Last week transfers to T1_UK_RAL_Disk, T1_FR_CCIN2P3_Disk, T1_ES_PIC_Disk had been failing with: TRANSFER TRANSFER  Operation timed out. [1][2]

I tried to transfer some of the files locally and I get an average rate of ~450KB/s. This rate is also consistent to what I find on FTS3 logs (this is an example: [3]).

Can you take a look and identify if there is something that can be done to increase the transfer rate?

Best,
David Urbina

[1] https://cmsweb.cern.ch/phedex/prod/Activity:;:QualityPlots?graph=quality_all&entity=dest&src_filter=T1_US_FNAL&dest_filter=T0%7CT1%7CT2_CH_CERN&no_mss=true&period=l7d&upto=
[2] https://cmsweb.cern.ch/phedex/prod/Activity:;:ErrorInfo?tofilter=T1_UK_RAL_Disk&fromfilter=T1_US_FNAL&report_code=.*&xfer_code=.*&to_pfn=.*&from_pfn=.*&log_detail=.*&log_validate=.*&.submit=Update#
[3] https://lcgfts07.gridpp.rl.ac.uk:8449/var/log/fts3/2017-06-30/cmsdcadisk01.fnal.gov__srm-cms.gridpp.rl.ac.uk/2017-06-30-1809__cmsdcadisk01.fnal.gov__srm-cms.gridpp.rl.ac.uk__557984284__ef6557de-5db5-11e7-8860-001dd8b71ccc

[Ticket Origin]
https://ggus.eu/ws/ticket_info.php?ticket=129300

[Ticket History]
Subject: Timeouts from FNAL
Ticket Type: USER
CC: ;cms-comp-ops-transfer-team@....
-- by David Urbina at Fri Jun 30 22:09:33 UTC 2017

Checking in FTS3 monitoring these timeouts are the most common error to transfers to other sites, not just to the three mention above[1].

Also, if I take a look at the quality of Debug Transfers. They are not presenting these timeouts [2]. Transfers rates for Debug Transfers to sites had good rates (15MB/s)(for example: [3])

[1]https://lcgfts3.gridpp.rl.ac.uk:8449/fts3/ftsmon/#/?vo=cms&source_se=srm:;%2F%2Fcmsdcadisk01.fnal.gov&dest_se=&time_window=168
[2]https://cmsweb.cern.ch/phedex/debug/Activity:;:QualityPlots?graph=quality_all&entity=link&src_filter=T1_US_FNAL&dest_filter=T0%7CT1%7CT2_CH_CERN&no_mss=true&period=l7d&upto=&.submit=Update
[3]https://lcgfts3.gridpp.rl.ac.uk:8449/fts3/ftsmon/#/job/1fab4c18-5d2b-11e7-9223-001dd8b71cf8
;

-- by David Urbina at Fri Jun 30 22:25:33 UTC 2017

We will follow up -- looks like this ticket did not get properly routed to our internal ticketing system.  Seeing if this reply manifests on our side...

GOC Ticket Version 2.2 | Report Bugs | Privacy Policy

Copyright 2018 The Trustees of Indiana University - Developed for Open Science Grid