From patchwork Tue Sep 3 11:05:41 2013 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Neil Williams X-Patchwork-Id: 19705 Return-Path: X-Original-To: linaro@patches.linaro.org Delivered-To: linaro@patches.linaro.org Received: from mail-qc0-f199.google.com (mail-qc0-f199.google.com [209.85.216.199]) by ip-10-151-82-157.ec2.internal (Postfix) with ESMTPS id E113725E8D for ; Tue, 3 Sep 2013 11:05:44 +0000 (UTC) Received: by mail-qc0-f199.google.com with SMTP id n7sf6757701qcx.2 for ; Tue, 03 Sep 2013 04:05:44 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20130820; h=x-gm-message-state:delivered-to:mime-version:to:from:subject :message-id:date:reply-to:sender:errors-to:precedence :x-original-sender:x-original-authentication-results:mailing-list :list-id:list-post:list-help:list-archive:list-unsubscribe :content-type; bh=HHwVfBpZo+Zdulezl9VE/bplTvtZKmLPn0GdqBlqp6g=; b=jQR6kIZq+rIzrt+GsVKbKMyRUraODsQu03EHtaQNdsSeMW1sPKeiiUn0UhV/qZMJ5x JqvAGdn8MeSZn3F7bVsgwR2cgjf9t1D7qBM9kU534Bp1KBrvP2MsB1zJb8tt5fHDSwtO Z/gwcq/D1+lnVEthrEZe8YO0k5sUfsvKQFBFtWn8d2Xp+0+M+PJ/TP4sZpWgLqqxhp8Q uFrYaZ1qybnFIZe/H/qLWVJbqxruTdr7dKjNP5fiUKWfcS8cEDIowuJRV0CTjPTtGo2V eymtqdY/51vRQe+0496qQEJKbGzdDLvL+yHub3BcHGp22r9Q7bp2PurYV9Or8VzKaqLr o2Ww== X-Received: by 10.236.193.6 with SMTP id j6mr10109478yhn.44.1378206344709; Tue, 03 Sep 2013 04:05:44 -0700 (PDT) X-BeenThere: patchwork-forward@linaro.org Received: by 10.49.117.129 with SMTP id ke1ls2413648qeb.88.gmail; Tue, 03 Sep 2013 04:05:44 -0700 (PDT) X-Received: by 10.58.73.202 with SMTP id n10mr28191340vev.7.1378206344596; Tue, 03 Sep 2013 04:05:44 -0700 (PDT) Received: from mail-vb0-f51.google.com (mail-vb0-f51.google.com [209.85.212.51]) by mx.google.com with ESMTPS id vh9si4298886vcb.100.1969.12.31.16.00.00 (version=TLSv1 cipher=ECDHE-RSA-RC4-SHA bits=128/128); Tue, 03 Sep 2013 04:05:44 -0700 (PDT) Received-SPF: neutral (google.com: 209.85.212.51 is neither permitted nor denied by best guess record for domain of patch+caf_=patchwork-forward=linaro.org@linaro.org) client-ip=209.85.212.51; Received: by mail-vb0-f51.google.com with SMTP id x16so3671809vbf.24 for ; Tue, 03 Sep 2013 04:05:44 -0700 (PDT) X-Gm-Message-State: ALoCoQksRfsJA1SEr9XKLpb7vO1xJzBiFEngs9c3bJ10AsCBGdTKJ8EtcSUekFNV6L1QXCqEBoow X-Received: by 10.220.91.16 with SMTP id k16mr7846025vcm.21.1378206344516; Tue, 03 Sep 2013 04:05:44 -0700 (PDT) X-Forwarded-To: patchwork-forward@linaro.org X-Forwarded-For: patch@linaro.org patchwork-forward@linaro.org Delivered-To: patches@linaro.org Received: by 10.220.174.196 with SMTP id u4csp164464vcz; Tue, 3 Sep 2013 04:05:43 -0700 (PDT) X-Received: by 10.180.13.174 with SMTP id i14mr17805172wic.49.1378206342407; Tue, 03 Sep 2013 04:05:42 -0700 (PDT) Received: from indium.canonical.com (indium.canonical.com. [91.189.90.7]) by mx.google.com with ESMTPS id m13si791527wiv.84.1969.12.31.16.00.00 (version=TLSv1 cipher=RC4-SHA bits=128/128); Tue, 03 Sep 2013 04:05:42 -0700 (PDT) Received-SPF: pass (google.com: best guess record for domain of bounces@canonical.com designates 91.189.90.7 as permitted sender) client-ip=91.189.90.7; Received: from ackee.canonical.com ([91.189.89.26]) by indium.canonical.com with esmtp (Exim 4.71 #1 (Debian)) id 1VGoQX-00069L-Lt for ; Tue, 03 Sep 2013 11:05:41 +0000 Received: from ackee.canonical.com (localhost [127.0.0.1]) by ackee.canonical.com (Postfix) with ESMTP id 9041EE0A37 for ; Tue, 3 Sep 2013 11:05:41 +0000 (UTC) MIME-Version: 1.0 X-Launchpad-Project: lava-scheduler X-Launchpad-Branch: ~linaro-validation/lava-scheduler/trunk X-Launchpad-Message-Rationale: Subscriber X-Launchpad-Branch-Revision-Number: 257 X-Launchpad-Notification-Type: branch-revision To: Linaro Patch Tracker From: noreply@launchpad.net Subject: [Branch ~linaro-validation/lava-scheduler/trunk] Rev 257: Neil Williams 2013-09-02 Ensure there is an actual device before trying Message-Id: <20130903110541.17878.98750.launchpad@ackee.canonical.com> Date: Tue, 03 Sep 2013 11:05:41 -0000 Reply-To: noreply@launchpad.net Sender: bounces@canonical.com Errors-To: bounces@canonical.com Precedence: list X-Generated-By: Launchpad (canonical.com); Revision="16753"; Instance="launchpad-lazr.conf" X-Launchpad-Hash: fd79632adfbdae9b940984ab5eed1cac02cc235f X-Removed-Original-Auth: Dkim didn't pass. X-Original-Sender: noreply@launchpad.net X-Original-Authentication-Results: mx.google.com; spf=neutral (google.com: 209.85.212.51 is neither permitted nor denied by best guess record for domain of patch+caf_=patchwork-forward=linaro.org@linaro.org) smtp.mail=patch+caf_=patchwork-forward=linaro.org@linaro.org Mailing-list: list patchwork-forward@linaro.org; contact patchwork-forward+owners@linaro.org List-ID: X-Google-Group-Id: 836684582541 List-Post: , List-Help: , List-Archive: List-Unsubscribe: , Merge authors: Neil Williams (codehelp) Related merge proposals: https://code.launchpad.net/~codehelp/lava-scheduler/reserved-boards/+merge/183496 proposed by: Neil Williams (codehelp) review: Approve - Antonio Terceiro (terceiro) ------------------------------------------------------------ revno: 257 [merge] committer: Neil Williams branch nick: lava-scheduler timestamp: Tue 2013-09-03 12:03:25 +0100 message: Neil Williams 2013-09-02 Ensure there is an actual device before trying to check it. Neil Williams 2013-09-02 Ensure the device transitions to Running once the job (single node or multi node) starts to run. Neil Williams 2013-09-02 Add a real user to the transition. Neil Williams 2013-09-02 [merge] Add Senthil's original change to make it easier to deploy. Neil Williams 2013-09-02 Add a new RESERVED device status and stop _fix_device from overloading RUNNING for the period after submission but before a MultiNode job is running. modified: lava_scheduler_app/admin.py lava_scheduler_app/api.py lava_scheduler_app/models.py lava_scheduler_app/views.py lava_scheduler_daemon/dbjobsource.py lava_scheduler_daemon/service.py --- lp:lava-scheduler https://code.launchpad.net/~linaro-validation/lava-scheduler/trunk You are subscribed to branch lp:lava-scheduler. To unsubscribe from this branch go to https://code.launchpad.net/~linaro-validation/lava-scheduler/trunk/+edit-subscription === modified file 'lava_scheduler_app/admin.py' --- lava_scheduler_app/admin.py 2013-08-28 15:13:07 +0000 +++ lava_scheduler_app/admin.py 2013-09-02 15:14:15 +0000 @@ -8,7 +8,7 @@ def offline_action(modeladmin, request, queryset): - for device in queryset.filter(status__in=[Device.IDLE, Device.RUNNING]): + for device in queryset.filter(status__in=[Device.IDLE, Device.RUNNING, Device.RESERVED]): if device.can_admin(request.user): device.put_into_maintenance_mode(request.user, "admin action") offline_action.short_description = "take offline" === modified file 'lava_scheduler_app/api.py' --- lava_scheduler_app/api.py 2013-08-28 15:13:07 +0000 +++ lava_scheduler_app/api.py 2013-09-02 15:14:15 +0000 @@ -2,7 +2,6 @@ from simplejson import JSONDecodeError from django.db.models import Count from linaro_django_xmlrpc.models import ExposedAPI -from lava_scheduler_app import utils from lava_scheduler_app.models import ( Device, DeviceType, @@ -165,8 +164,8 @@ .annotate(idle=SumIf('device', condition='status=%s' % Device.IDLE), offline=SumIf('device', condition='status in (%s,%s)' % (Device.OFFLINE, Device.OFFLINING)), - busy=SumIf('device', condition='status=%s' - % Device.RUNNING), ).order_by('name') + busy=SumIf('device', condition='status in (%s,%s)' + % (Device.RUNNING, Device.RESERVED)), ).order_by('name') for dev_type in device_types: device_type = {} === modified file 'lava_scheduler_app/models.py' --- lava_scheduler_app/models.py 2013-08-28 15:13:07 +0000 +++ lava_scheduler_app/models.py 2013-09-02 15:42:27 +0000 @@ -51,19 +51,20 @@ def check_device_availability(requested_devices): - """Checks whether the number of devices requested is available. + """Checks whether the number of devices requested is available for a multinode job. See utils.requested_device_count() for details of REQUESTED_DEVICES dictionary format. - Returns True if the requested number of devices are available, else - raises DevicesUnavailableException. + Returns True for singlenode or if the requested number of devices are available + for the multinode job, else raises DevicesUnavailableException. """ device_types = DeviceType.objects.values_list('name').filter( - models.Q(device__status=Device.IDLE) | \ - models.Q(device__status=Device.RUNNING) + models.Q(device__status=Device.IDLE) | + models.Q(device__status=Device.RUNNING) | + models.Q(device__status=Device.RESERVED) ).annotate( - num_count=models.Count('name') + num_count=models.Count('name') ).order_by('name') if requested_devices: @@ -115,6 +116,7 @@ RUNNING = 2 OFFLINING = 3 RETIRED = 4 + RESERVED = 5 STATUS_CHOICES = ( (OFFLINE, 'Offline'), @@ -122,6 +124,7 @@ (RUNNING, 'Running'), (OFFLINING, 'Going offline'), (RETIRED, 'Retired'), + (RESERVED, 'Reserved') ) # A device health shows a device is ready to test or not @@ -201,7 +204,7 @@ return user.has_perm('lava_scheduler_app.change_device') def put_into_maintenance_mode(self, user, reason): - if self.status in [self.RUNNING, self.OFFLINING]: + if self.status in [self.RUNNING, self.RESERVED, self.OFFLINING]: new_status = self.OFFLINING else: new_status = self.OFFLINE @@ -236,6 +239,16 @@ self.health_status = Device.HEALTH_LOOPING self.save() + def cancel_reserved_status(self, user, reason): + if self.status != Device.RESERVED: + return + new_status = self.IDLE + DeviceStateTransition.objects.create( + created_by=user, device=self, old_state=self.status, + new_state=new_status, message=reason, job=None).save() + self.status = new_status + self.save() + class JobFailureTag(models.Model): """ @@ -324,7 +337,7 @@ tags = models.ManyToManyField(Tag, blank=True) - # This is set once the job starts. + # This is set once the job starts or is reserved. actual_device = models.ForeignKey( Device, null=True, default=None, related_name='+', blank=True) @@ -598,6 +611,10 @@ return self._can_admin(user) and self.status in states def cancel(self): + # if SUBMITTED with actual_device - clear the actual_device back to idle. + if self.status == TestJob.SUBMITTED and self.actual_device is not None: + device = Device.objects.get(hostname=self.actual_device) + device.cancel_reserved_status(self.submitter, "multinode-cancel") if self.status == TestJob.RUNNING: self.status = TestJob.CANCELING else: === modified file 'lava_scheduler_app/views.py' --- lava_scheduler_app/views.py 2013-08-28 15:13:07 +0000 +++ lava_scheduler_app/views.py 2013-09-02 15:14:15 +0000 @@ -371,7 +371,8 @@ .annotate(idle=SumIf('device', condition='status=%s' % Device.IDLE), offline=SumIf('device', condition='status in (%s,%s)' % (Device.OFFLINE, Device.OFFLINING)), - busy=SumIf('device', condition='status=%s' % Device.RUNNING),).order_by('name') + busy=SumIf('device', condition='status in (%s,%s)' % + (Device.RUNNING, Device.RESERVED)),).order_by('name') def render_status(self, record): return "%s idle, %s offline, %s busy" % (record.idle, @@ -535,7 +536,7 @@ 'health_jobs', reverse(health_jobs_json, kwargs=dict(pk=pk)), params=(device,)), 'show_maintenance': device.can_admin(request.user) and - device.status in [Device.IDLE, Device.RUNNING], + device.status in [Device.IDLE, Device.RUNNING, Device.RESERVED], 'show_online': device.can_admin(request.user) and device.status in [Device.OFFLINE, Device.OFFLINING], 'bread_crumb_trail': BreadCrumbTrail.leading_to(health_job_list, pk=pk), @@ -993,7 +994,7 @@ 'jobs', reverse(recent_jobs_json, kwargs=dict(pk=device.pk)), params=(device,)), 'show_maintenance': device.can_admin(request.user) and - device.status in [Device.IDLE, Device.RUNNING], + device.status in [Device.IDLE, Device.RUNNING, Device.RESERVED], 'show_online': device.can_admin(request.user) and device.status in [Device.OFFLINE, Device.OFFLINING], 'bread_crumb_trail': BreadCrumbTrail.leading_to(device_detail, pk=pk), === modified file 'lava_scheduler_daemon/dbjobsource.py' --- lava_scheduler_daemon/dbjobsource.py 2013-08-31 01:38:21 +0000 +++ lava_scheduler_daemon/dbjobsource.py 2013-09-02 18:14:25 +0000 @@ -129,14 +129,18 @@ def _fix_device(self, device, job): """Associate an available/idle DEVICE to the given JOB. + If the MultiNode job is waiting as Submitted, the device + could be running a different job. Returns the job with actual_device set to DEVICE. If we are unable to grab the DEVICE then we return None. """ + if device.status == Device.RUNNING: + return None DeviceStateTransition.objects.create( created_by=None, device=device, old_state=device.status, - new_state=Device.RUNNING, message=None, job=job).save() - device.status = Device.RUNNING + new_state=Device.RESERVED, message=None, job=job).save() + device.status = Device.RESERVED device.current_job = job try: # The unique constraint on current_job may cause this to @@ -190,10 +194,10 @@ for d in devices: self.logger.debug("Checking %s" % d.hostname) if d.hostname in configured_boards: - if job: - job = self._fix_device(d, job) - if job: - job_list.add(job) + if job: + job = self._fix_device(d, job) + if job: + job_list.add(job) # Remove scheduling multinode jobs until all the jobs in the # target_group are assigned devices. @@ -288,6 +292,14 @@ def getJobDetails_impl(self, job): job.status = TestJob.RUNNING + # need to set the device RUNNING if device was RESERVED + if job.actual_device.status == Device.RESERVED: + DeviceStateTransition.objects.create( + created_by=None, device=job.actual_device, old_state=job.actual_device.status, + new_state=Device.RUNNING, message=None, job=job).save() + job.actual_device.status = Device.RUNNING + job.actual_device.current_job = job + job.actual_device.save() job.start_time = datetime.datetime.utcnow() shutil.rmtree(job.output_dir, ignore_errors=True) job.log_file.save('job-%s.log' % job.id, ContentFile(''), save=False) @@ -316,6 +328,8 @@ device.status = Device.IDLE elif device.status == Device.OFFLINING: device.status = Device.OFFLINE + elif device.status == Device.RESERVED: + device.status = Device.IDLE else: self.logger.error( "Unexpected device state in jobCompleted: %s" % device.status) === modified file 'lava_scheduler_daemon/service.py' --- lava_scheduler_daemon/service.py 2013-08-30 18:07:18 +0000 +++ lava_scheduler_daemon/service.py 2013-09-02 18:14:53 +0000 @@ -47,7 +47,7 @@ x.hostname for x in dispatcher_config.get_devices()] for job in job_list: - if job.actual_device.hostname in configured_boards: + if job.actual_device and job.actual_device.hostname in configured_boards: new_job = JobRunner(self.source, job, self.dispatcher, self.reactor, self.daemon_options) self.logger.info("Starting Job: %d " % job.id)