Per-Activity Power Usage: Difference between revisions

From OLPC
Jump to navigation Jump to search
No edit summary
No edit summary
 
(2 intermediate revisions by one other user not shown)
Line 29: Line 29:


==A better model: additive usage with a ceiling==
==A better model: additive usage with a ceiling==
The laptop is only capable of drawing, at most, a maximum wattage <math>M</math>. Therefore, a better model of the system imagines that each activity would independently draw a wattage <math>w_i</math>, and the total system power drain is the sum of the <math>w_i</math> or the maximum wattage, whichever is lower. Mathematically, we would say <math>T = \min \left( M, \sum{i\in R} w_i\right)</math>. We may then follow the model from above, accounting for the power used during each period in which a fixed set of activities is open. If that power usage <math>J</math> is less than <math>Mt</math>, then we know that we are in the regime where <math>T = \sum{i\in R} w_i</math>, and we may add a row to the matrix <math>A</math>, as described above. Conversely, if we have <math>J=Mt</math>, then we know that <math>\sum{i\in R} w_i \geq M</math>. We may write this inequality as a constraint, in addition to the positive semi-definiteness requirement. Quadratic programming supports an unlimited number of such constraints, and the problem remains convex, and therefore quickly solvable.
The laptop is only capable of drawing, at most, a maximum wattage <math>M</math>. Therefore, a better model of the system imagines that each activity would independently draw a wattage <math>w_i</math>, and the total system power drain is the sum of the <math>w_i</math> or the maximum wattage, whichever is lower. Mathematically, we would say <math>T = \min \left( M, \sum_{i\in R} w_i\right)</math>. We may then follow the model from above, accounting for the power used during each period in which a fixed set of activities is open. If that power usage <math>J</math> is less than <math>Mt</math>, then we know that we are in the regime where <math>T = \sum_{i\in R} w_i</math>, and we may add a row to the matrix <math>A</math>, as described above. Conversely, if we have <math>J=Mt</math>, then we know that <math>\sum_{i\in R} w_i \geq M</math>. We may write this inequality as a constraint, in addition to the positive semi-definiteness requirement. Quadratic programming supports an unlimited number of such constraints, and the problem remains convex, and therefore quickly solvable.


===Problem: Measurement Error===
===Problem: Measurement Error===
These measurements of <math>J</math> and <math>M</math> will contain some error. As a result, it is entirely possible that <math>J</math> will never be measured as being quite equal to <math>M</math>. To relax this requirement somewhat, we may simply underestimate <math>M</math>. There is also the reverse problem: if a measurement <math>J</math> is slightly less than <math>M</math>, then it is possible that a measurement error will result in <math>J</math> crossing the threshold, and being added to the model as a constraint, rather than as a datapoint. In quadratic programming, constraints are absolute, so this erroneous constraint will not "average out".
These measurements of <math>J</math> and <math>M</math> will contain some error. As a result, it is entirely possible that <math>J</math> will never be measured as being quite equal to <math>M</math>. To relax this requirement somewhat, we may simply underestimate <math>M</math>. There is also the reverse problem: if a measurement <math>J</math> is slightly less than <math>M</math>, then it is possible that a measurement error will result in <math>J</math> crossing the threshold, and being added to the model as a constraint, rather than as a datapoint. In quadratic programming, constraints are absolute, so this erroneous constraint will not "average out".


To solve this problem, we may set a threshold <math>M</math>, and when it is crossed, impose an inequality constraint of the form <math>\sum{i\in R} w_i \geq M - \epsilon</math>, where <math>\epsilon</math> is greater than any expected measurement error. This reduces the accuracy of the measurements for high-load activities, but also reduces the chance of spuriously constraining a set of activities whose actual power draw is much lower.
To solve this problem, we may set a threshold <math>M</math>, and when it is crossed, impose an inequality constraint of the form <math>\sum_{i\in R} w_i \geq M - \epsilon</math>, where <math>\epsilon</math> is greater than any expected measurement error. This reduces the accuracy of the measurements for high-load activities, but also reduces the chance of spuriously constraining a set of activities whose actual power draw is much lower.


===Problem: Unlimited Growth===
===Problem: Unlimited Growth===
Line 46: Line 46:


===Problem: Non-additivity of power===
===Problem: Non-additivity of power===
There is no limit to the complexity with which power usages of different program may interact. Indeed, predicting the power usage of a program is necessarily as hard as the Halting Problem, and therefore uncomputable. The assumption that the total power usage is simply a sum of individual per-program power drains is clearly false. A more sophisticated approach might, for example, train a neural network to predict how much power any combination of running activities would be likely to draw.
There is no limit to the complexity with which power usages of different programs may interact. Indeed, predicting the power usage of a program is necessarily as hard as the Halting Problem, and therefore uncomputable. The assumption that the total power usage is simply a sum of individual per-program power drains is clearly false. A more sophisticated approach might, for example, train a neural network to predict how much power any combination of running activities would be likely to draw.


However, our goal is to present the user with an indication of how much power each Activity draws. The fact that this number is not well-defined means that we must come up with a sensible definition for it. The above constrained-minimization seems as logical a definition as any.
However, our goal is to present the user with an indication of how much power each Activity draws. The fact that this number is not well-defined means that we must come up with a sensible definition for it. The above constrained-minimization seems as logical a definition as any.

[[Category:Power]]
[[Category:Hardware]]

Latest revision as of 17:51, 14 August 2008

Background

As OLPC approaches its goal of highly effective power management, it will eventually be the case that different Activities cause the XO to draw different amounts of battery power. There has been some suggestion that users may wish to know how each Activity affects their battery life. By providing users with feedback about which Activities draw the most power, we may help them to extend their battery life, and also apply appropriate pressure on developers to increase the efficiency of their Activities. There might even be security benefits, in the case of activities running hidden spambots or password-crackers.

There are two main questions that a user might ask about each Activity: "how much power does it tend to use?", and "how much power is it using right now?". We may call these "historical" and "instantaneous" power usage.

The proposed power management systems derive most of their effectiveness from turning off the CPU when software does not need it. The system provides several important diagnostics for power management. The first is a Coulomb Counter, which allows the system software to determine how much charge has been drawn from the battery. The second is a wakeup history, which allows us to determine when the CPU was powered on or off.

Instantaneous Power Usage

Instantaneous power usage is not well-defined on a per-activity basis due to the multitasking nature of the OS. It would not be difficult to determine the instantaneous total power usage by, for example, sampling the Coulomb Counter at one minute intervals and noting the rate of discharge.

If the system uses aggressive power management, then there may be several excellent proxies for instantaneous per-activity power usage. The best such proxy might be a combination of CPU utilization and system wakeups. The details of how to combine these two depend on the suspend system's idle detection strategies. For example, if idleness is detected solely on the basis of the time until the next scheduled event, then the best strategy might be to assign the entire power usage during each CPU-on period to the activity whose wakeup event started that period, or to the active activity if the CPU-on period was triggered by the user. If the idle detection is based on CPU usage, then one might instead instead divide power usage among activities in proportion to CPU usage. There are various steps in between these two options; the best solution will probably require empirical fiddling.

Historical Power Usage

If instantaneous power usage numbers can be calculated with satisfactory accuracy, then the historical power usage can be calculated trivially, using simple techniques like exponential weighting. However, it seems likely that the instantaneous power usage statistics will not be especially accurate, due to the heuristic nature of the methods.

For historical power usage, we have a much more powerful tool: the Coulomb Counter. However, the Coulomb Counter only tells us aggregate system power usage. In order to determine the contribution of each activity to power usage, we must correlate the activity use history with the power usage history. This requires a model of how activities interact with regard to power usage.

The Simplest Model: Additive power

The simplest model for power usage is that Activity <math>i</math> draws a wattage <math>w_i</math>. We let <math>w_0</math> represent the power used by the base system when no activities are running. If the set of running Activities at a given moment is <math>R</math>, then the total power usage <math>T = \sum_{i\in R} w_i</math>.

Whenever the user starts or stops an activity under battery power, the system should make a log of the change in coulombs since the last reading, the time since the last reading, and the list of activities that were running since the last reading. Multiplying by the system voltage, each such record provides an equation for the number of Joules used: <math>J = \sum_{i\in R} w_i t</math>, where <math>t</math> is the time for that interval. To combine all of these measurements, we create a sparse matrix <math>A</math> with one column for each Activity and one row for each reading. An element <math>A_{ij}</math> is 0 if Activity <math>i</math> was not running during reading <math>j</math>. If Activity <math>i</math> was running during reading <math>j</math>, <math>A_{ij}=t_j</math>. We also create a column vector <math>b</math> with one entry for each reading, with <math>b_j</math> equal to the number of Joules used during reading <math>j</math>.

We may now formulate the wattage problem as <math>Aw = b</math>, where w is a column vector of wattages per activity that we wish to know. The matrix <math>A</math> is nonsquare, so this system cannot be solved by inversion. However, there are many well-known fast techniques for determining the least-squares solution to an over- or under-determined linear system, such as Conjugate Gradients on the Normal Equations (CGNR). Additionally, due to measurement error or modelling error, it is possible that the least-squares solution may produce negative wattages. To prevent this in an optimal fashion, we may restrict <math>w</math> to be all-positive and use fast Quadratic Programming methods to solve the restricted problem.

Problems

This model is linear and additive; the presence of multiple activities does not affect how much power each one draws. This assumption is not very good, especially for high power usage activities. If two activities would each draw full power independently, running them simultaneously will not double the power usage. This model also does not account for level of user activity, network environment, or any other factors that may cause variation in power usage. Nonetheless, it is likely to be quite effective at distinguishing between the Activities that tend to draw a great deal of power and those that do not.

One possible improvement would be to allocate <math>w_1</math> to the backlight level, <math>w_2</math> to the DCON, etc. This will improve accuracy somewhat, but does not solve the fundamental problems with this model.

A better model: additive usage with a ceiling

The laptop is only capable of drawing, at most, a maximum wattage <math>M</math>. Therefore, a better model of the system imagines that each activity would independently draw a wattage <math>w_i</math>, and the total system power drain is the sum of the <math>w_i</math> or the maximum wattage, whichever is lower. Mathematically, we would say <math>T = \min \left( M, \sum_{i\in R} w_i\right)</math>. We may then follow the model from above, accounting for the power used during each period in which a fixed set of activities is open. If that power usage <math>J</math> is less than <math>Mt</math>, then we know that we are in the regime where <math>T = \sum_{i\in R} w_i</math>, and we may add a row to the matrix <math>A</math>, as described above. Conversely, if we have <math>J=Mt</math>, then we know that <math>\sum_{i\in R} w_i \geq M</math>. We may write this inequality as a constraint, in addition to the positive semi-definiteness requirement. Quadratic programming supports an unlimited number of such constraints, and the problem remains convex, and therefore quickly solvable.

Problem: Measurement Error

These measurements of <math>J</math> and <math>M</math> will contain some error. As a result, it is entirely possible that <math>J</math> will never be measured as being quite equal to <math>M</math>. To relax this requirement somewhat, we may simply underestimate <math>M</math>. There is also the reverse problem: if a measurement <math>J</math> is slightly less than <math>M</math>, then it is possible that a measurement error will result in <math>J</math> crossing the threshold, and being added to the model as a constraint, rather than as a datapoint. In quadratic programming, constraints are absolute, so this erroneous constraint will not "average out".

To solve this problem, we may set a threshold <math>M</math>, and when it is crossed, impose an inequality constraint of the form <math>\sum_{i\in R} w_i \geq M - \epsilon</math>, where <math>\epsilon</math> is greater than any expected measurement error. This reduces the accuracy of the measurements for high-load activities, but also reduces the chance of spuriously constraining a set of activities whose actual power draw is much lower.

Problem: Unlimited Growth

In this description <math>A</math> has a column for every Activity and a row for every measurement. If <math>A</math> is stored sparsely, and there are 5 activities running on average, then each measurement costs about 60 bytes in double precision. If the user starts or stops Activities 100 times a day, then <math>A</math> grows by 6 KB per day, or a little over 2 MB per year. This is too much storage, and the linear algebra on a matrix of this size becomes very expensive. One solution is to simply age out the oldest rows in <math>A</math> as new rows are added. However, to reach a reasonable size (e.g. 100 KB) entries would generally be forgotten in a month or less, which may be too short a time to accumulate enough data.

Luckily, there is a helpful trick that we may use. Many algorithms for minimizing <math>|Ax - b|</math> work by solving the equivalent equation <math>A^TAx = A^Tb</math>. The matrix <math>A^TA</math> is square <math>N\times N</math> if there are <math>N</math> Activities installed. If there are 100 installed Activities, then storing <math>A^TA</math> probably requires about 40 kB in double precision (the matrix is symmetrical, so we only have to store half the entries). Instead of appending a new row <math>\rho</math> to <math>A</math>, we can equivalently replace <math>A^TA</math> by <math>A^TA + \rho^T\rho</math>. Similarly, instead of appending a new entry <math>J</math> to the column vector <math>b</math> we may replace <math>A^Tb</math> by <math>A^Tb + \rho^TJ</math>.

If <math>A^TA</math> grows too large, because the user has too many installed activities, then we may "age out" the least recently used Activities. This is somewhat harsher than aging out individual rows from <math>A</math>, but it is far less likely to be necessary, unless users typically have hundreds of Activities installed.

There remains the problem of unlimited growth in number of constraints. There is no equivalent way of amalgamating constraints, because there are <math>2^N</math> possible constraints that might be generated. However, the constraints are also problematic because of the potential for one-time errors to result in erroneous hard constraints. In order to avoid permanent errors, and to conserve space, it seems sensible to age out the constraints. This is also advantageous because quadratic programming algorithms often run markedly slower as the number of constraints increases.

Problem: Non-additivity of power

There is no limit to the complexity with which power usages of different programs may interact. Indeed, predicting the power usage of a program is necessarily as hard as the Halting Problem, and therefore uncomputable. The assumption that the total power usage is simply a sum of individual per-program power drains is clearly false. A more sophisticated approach might, for example, train a neural network to predict how much power any combination of running activities would be likely to draw.

However, our goal is to present the user with an indication of how much power each Activity draws. The fact that this number is not well-defined means that we must come up with a sensible definition for it. The above constrained-minimization seems as logical a definition as any.