Abstract

We consider a multiround auction setting motivated by pay-per-click auctions for Internet advertising. In each round the auctioneer selects an advertiser and shows her ad, which is then either clicked or not. An advertiser derives value from clicks; the value of a click is her private information. Initially, neither the auctioneer nor the advertisers have any information about the likelihood of clicks on the advertisements. The auctioneer's goal is to design a (dominant strategies) truthful mechanism that (approximately) maximizes the social welfare. If the advertisers bid their true private values, our problem is equivalent to the multi-armed bandit problem, and thus can be viewed as a strategic version of the latter. In particular, for both problems the quality of an algorithm can be characterized by regret, the difference in social welfare between the algorithm and the benchmark which always selects the same “best” advertisement. We investigate how the design of multi-armed bandit algorithms is affected by the restriction that the resulting mechanism must be truthful. We find that deterministic truthful mechanisms have certain strong structural properties---essentially, they must separate exploration from exploitation---and they incur much higher regret than the optimal multi-armed bandit algorithms. Moreover, we provide a truthful mechanism which (essentially) matches our lower bound on regret.

Keywords

  1. algorithmic mechanism design
  2. truthful mechanisms
  3. single-parameter auctions
  4. pay-per-click auctions
  5. multi-armed bandits
  6. regret

MSC codes

  1. 91A99
  2. 91B26
  3. 91A26
  4. 91A05
  5. 91A06
  6. 68Q25
  7. 68Q32
  8. 68W27
  9. 68W40

Get full access to this article

View all available purchase options and get full access to this article.

References

1.
G. Aggarwal, A. Goel, and R. Motwani, Truthful auctions for pricing search keywords, in ACM Conference on Electronic Commerce (EC), ACM, New York, 2006, pp. 1--7.
2.
G. Aggarwal and S. Muthukrishnan, Tutorial on theory of sponsored search auctions, in IEEE Symposium on Foundations of Computer Science (FOCS), Philadelphia, PA, 2008.
3.
A. Archer and É. Tardos, Truthful mechanisms for one-parameter agents, in IEEE Symposium on Foundations of Computer Science (FOCS), IEEE Computer Society, Los Alamitos, CA, 2001, pp. 482--491.
4.
S. Athey and I. Segal, An Efficient Dynamic Mechanism, http://www.stanford. edu/˜isegal/agv.pdf (Mar. 2007).
5.
J.Y. Audibert and S. Bubeck, Regret bounds and minimax policies under partial monitoring, J. Mach. Learn. Res., 11 (2010), pp. 2785--2836.
6.
P. Auer, N. Cesa-Bianchi, and P. Fischer, Finite-time analysis of the multiarmed bandit problem, Mach. Learn., 47 (2002), pp. 235--256.
7.
P. Auer, N. Cesa-Bianchi, Y. Freund, and R.E. Schapire, The nonstochastic multiarmed bandit problem, SIAM J. Comput., 32 (2002), pp. 48--77.
8.
B. Awerbuch and R. Kleinberg, Online linear optimization and adaptive routing, J. Comput. System Sci., 74 (2008), pp. 97--114.
9.
M. Babaioff, R. Kleinberg, and A. Slivkins, Truthful mechanisms with implicit payment computation, in 11th ACM Conference on Electronic Commerce (EC), ACM, New York, 2010, pp. 43--52.
10.
M.-F. Balcan, A. Blum, J.D. Hartline, and Y. Mansour, Reducing mechanism design to algorithm design via machine learning, J. Comput. System Sci., 74 (2008), pp. 1245--1270.
11.
M. Ben-Or and A. Hassidim, The Bayesian learner is optimal for noisy binary search (and pretty good for quantum as well), in IEEE Symposium on Foundations of Computer Science (FOCS), IEEE Computer Society, Los Alamitos, CA, 2008, pp. 221--230.
12.
D. Bergemann and J. Välimäki, Bandit problems, in The New Palgrave Dictionary of Economics, 2nd ed., S. Durlauf and L. Blume, eds., Macmillan, Basingstoke, England, 2008.
13.
D. Bergemann and J. Välimäki, Efficient Dynamic Auctions, cowles.econ.yale.edu/P/cd/d15b/d1584.pdf (Oct. 2006).
14.
N. Cesa-Bianchi and G. Lugosi, Prediction, Learning, and Games, Cambridge University Press, Cambridge, 2006.
15.
T.M. Cover and J.A. Thomas, Elements of Information Theory, Wiley, New York, 1991.
16.
V. Dani and T.P. Hayes, Robbing the bandit: Less regret in online geometric optimization against an adaptive adversary, in 17th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), SIAM, Philadelphia, 2006, pp. 937--943.
17.
N. Devanur and S.M. Kakade, The price of truthfulness for pay-per-click auctions, in 10th ACM Conference on Electronic Commerce (EC), ACM, New York, 2009, pp. 99--106.
18.
S. Dobzinski and M. Sundararajan, On characterizations of truthful mechanisms for combinatorial auctions and scheduling, in ACM Conference on Electronic Commerce (EC), ACM, New York, 2008, pp. 38--47.
19.
B. Edelman, M. Ostrovsky, and M. Schwarz, Internet advertising and the generalized second-price auction: Selling billions of dollars worth of keywords, Amer. Econom. Rev., 97 (2007), pp. 242--259.
20.
A. Flaxman, A. Kalai, and H.B. McMahan, Online convex optimization in the bandit setting: Gradient descent without a gradient, in 16th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), SIAM, Philadelphia, 2005, pp. 385--394.
21.
P.R. Freeman, The secretary problem and its extensions: A review, Internat. Statist. Rev., 51 (1983), pp. 189--206.
22.
N. Gatti, A. Lazaric, and F. Trovo, A truthful learning mechanism for contextual multi-slot sponsored search auctions with externalities, in 13th ACM Conference on Electronic Commerce (EC), ACM, New York, 2012.
23.
E. Hazan and S. Kale, Better algorithms for benign bandits, in 20th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), SIAM, Philadelphia, 2009, pp. 38--47.
24.
N. Immorlica, K. Jain, M. Mahdian, and K. Talwar, Click fraud resistant methods for learning click-through rates, in Internet and Network Economics (WINE), Lecture Notes in Comput. Sci. 3828, Springer, Berlin, 2005, pp. 34--45.
25.
Sh.M. Kakade, I. Lobel, and H. Nazerzadeh, Optimal Dynamic Mechanism Design and the Virtual Pivot Mechanism, Technical report, SSRN 1782211, 2011.
26.
R. Karp and R. Kleinberg, Noisy binary search and its applications, in 18th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), SIAM, Philadelphia, 2007, pp. 881--890.
27.
R. Kleinberg, A. Slivkins, and E. Upfal, Multi-armed bandits in metric spaces, in 40th ACM Symposium on Theory of Computing (STOC), ACM, New York, 2008, pp. 681--690.
28.
R. Kleinberg, Online Decision Problems with Large Strategy Sets, Ph.D. thesis, MIT, Cambridge, MA, 2005.
29.
R. Kleinberg, Lecture Notes: CS683: Learning, Games, and Electronic Markets (Week 8), http://www.cs.cornell.edu/courses/cs683/2007sp/lecnotes/week8.pdf (Spring, 2007).
30.
R. Kleinberg, Lecture Notes: CS683: Learning, Games, and Electronic Markets (Week 9), http://www.cs.cornell.edu/courses/cs683/2007sp/lecnotes/week9.pdf (Spring, 2007).
31.
S. Lahaie, D.M. Pennock, A. Saberi, and R.V. Vohra, Sponsored search auctions, in Algorithmic Game Theory, N. Nisan, T. Roughgarden, E. Tardos, and V. Vazirani, eds., Cambridge University Press, Cambridge, 2007, pp. 699--716.
32.
T.L. Lai and H. Robbins, Asymptotically efficient adaptive allocation rules, Adv. in Appl. Math., 6 (1985), pp. 4--22.
33.
J. Langford and T. Zhang, The epoch-greedy algorithm for contextual multi-armed bandits, in Advances in Neural Information Processing Systems 20 (NIPS), MIT Press, Cambridge, MA, 2008, pp. 817--824.
34.
R. Lavi, A. Mu'alem, and N. Nisan, Towards a characterization of truthful combinatorial auctions, in IEEE Symposium on Foundations of Computer Science (FOCS), IEEE Computer Society, Los Angeles, CA, 2003, pp. 574--583.
35.
R. Lavi and N. Nisan, Online ascending auctions for gradually expiring items, in 16th Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), SIAM, Philadelphia, 2005, pp. 1146--1155.
36.
A. Mehta, A. Saberi, U. Vazirani, and V. Vazirani, Adwords and generalized online matching, J. ACM, 54 (2007), 22.
37.
R. Meir, A.D. Procaccia, and J.S. Rosenschein, Algorithms for strategyproof classification, Artificial Intelligence, 186 (2012), pp. 123--156.
38.
R.B. Myerson, Optimal auction design, Math. Oper. Res., 6 (1981), pp. 58--73.
39.
H. Nazerzadeh, A. Saberi, and R. Vohra, Dynamic cost-per-action mechanisms and applications to online advertising, in 17th International World Wide Web Conference (WWW), ACM, New York, 2008.
40.
N. Nisan and A. Ronen, Algorithmic mechanism design, Games Econom. Behav., 35 (2001), pp. 166--196.
41.
N. Nisan, T. Roughgarden, E. Tardos, and V. Vazirani, eds., Algorithmic Game Theory, Cambridge University Press, Cambridge, 2007.
42.
S. Pandey, D. Chakrabarti, and D. Agarwal, Multi-armed bandit problems with dependent arms, in 24th International Conference on Machine Learning (ICML), ACM, New York, 2007, pp. 721--728.
43.
C. Papadimitriou, M. Schapira, and Y. Singer, On the hardness of being truthful, in IEEE Symposium on Foundations of Computer Science (FOCS), IEEE Computer Society, Los Alamitos, CA, 2008, pp. 250--259.
44.
A. Pavan, I. Segal, and J. Toikka, Dynamic Mechanism Design: A Myersonian Approach, Econometrica, forthcoming.
45.
F. Radlinski, R. Kleinberg, and T. Joachims, Learning diverse rankings with multi-armed bandits, in 25th International Conference on Machine Learning (ICML), ACM, Philadelphia, 2008, pp. 784--791.
46.
M. Rothschild, A two-armed bandit theory of market pricing, J. Econom. Theory, 9 (1974), pp. 185--202.
47.
T. Roughgarden, An algorithmic game theory primer, in IFIP International Conference on Theoretical Computer Science (TCS), Milan, Italy, 2008.
48.
V. Shnayder, J. Hoon, D. Parkes, and V. Kawadia, Truthful prioritization schemes for spectrum sharing, in 7th Workshop on the Economics of Networks, Systems and Computation (NetEcon), Orlando, FL, 2012.
49.
A. Slivkins and E. Upfal, Adapting to a changing environment: The Brownian restless bandits, in 21st Conference on Learning Theory (COLT), Helsinki, Finland, 2008, pp. 343--354.
50.
A. Slivkins, Contextual bandits with similarity information, in 24th Conference on Learning Theory (COLT), Budapest, Hungary, 2011.
51.
A. Slivkins, Monotone multi-armed bandit allocations, in 24th Conference on Learning Theory (COLT), Budapest, Hungary, 2011.
52.
N. Srinivas, A. Krause, S. Kakade, and M. Seeger, Gaussian process optimization in the bandit setting: No regret and experimental design, in 27th International Conference on Machine Learning (ICML), ACM, New York, 2010, pp. 1015--1022.
53.
M. Streeter and D. Golovin, An online algorithm for maximizing submodular functions, in Adv. Neural Inform. Process. Systems, 2008, pp. 1577--1584.
54.
W.R. Thompson, On the likelihood that one unknown probability exceeds another in view of the evidence of two samples, Biometrika, 25 (1933), p. 285--294.
55.
H.R. Varian, Position auctions, Internat. J. Indust. Organ., 25 (2007), pp. 1163--1178.
56.
C. Wilkens and B. Sivan, Single-call mechanisms, in 13th ACM Conference on Electronic Commerce (EC), ACM, New York, 2012, pp. 946--963.

Information & Authors

Information

Published In

cover image SIAM Journal on Computing
SIAM Journal on Computing
Pages: 194 - 230
ISSN (online): 1095-7111

History

Submitted: 29 May 2012
Accepted: 26 November 2013
Published online: 6 February 2014

Keywords

  1. algorithmic mechanism design
  2. truthful mechanisms
  3. single-parameter auctions
  4. pay-per-click auctions
  5. multi-armed bandits
  6. regret

MSC codes

  1. 91A99
  2. 91B26
  3. 91A26
  4. 91A05
  5. 91A06
  6. 68Q25
  7. 68Q32
  8. 68W27
  9. 68W40

Authors

Affiliations

Metrics & Citations

Metrics

Citations

If you have the appropriate software installed, you can download article citation data to the citation manager of your choice. Simply select your manager software from the list below and click Download.

Cited By

View Options

View options

PDF

View PDF

Media

Figures

Other

Tables

Share

Share

Copy the content Link

Share with email

Email a colleague

Share on social media