Nav view search



Planet MySQL

Planet MySQL -
  1. via GIPHY Amazon releases a new database offering every other day. It sure isn’t easy to keep up. Join 35,000 others and follow Sean Hull on twitter @hullsean. Let’s say you’re hiring a devops & you want to suss out their database knowledge? Or you’re hiring a professional services firm or freelance consultant. Whatever the … Continue reading How to interview an amazon database expert →
  2. Back in May I wrote about a web service I’m working on that uses local storage and replication with two-phase commit. I pulled out the core of it and created a package that I’m calling the Rig. It’s up on GitHub already: The goal of the Rig is to take some web service and add a log and replication on top. A service is simply something that accepts operations. type Service interface { Validate(Operation) error Apply(uint64, Operation) error LockResources(Operation) bool UnlockResources(Operation) } Each operation is associated with an entry in a log. type LogPayload struct { Version uint64 `json:"version"` Op Operation `json:"op"` } And an operation is a method with some data. type Operation struct { Method string `json:"method"` Data json.RawMessage `json:"data"` } The service I created has methods like user_create, user_update, user_delete, and so on. The Rig is still very rough. It kinda works… meaning it works great during “happy” times. There’s a lot of work to do to handle errors. Synchronous replication is one of those things that makes a lot of sense in the abstract, but can be all over the place in implementation. For example, consider synchronous replication with two nodes. Writes succeed as long as both the primary and the replica acknowledge the write. If the primary fails, usually the replica takes over without any loss of data. But then you’re left with only one node running, and no more replication. What if instead of the primary failing, only the replica fails? Should the primary keep going? If it does, didn’t we just ignore the synchronous part of the replication? But that’s the expected behavior for most systems. You want the system to be available even if a node fails. And when a node fails, you don’t want to lose any data. It’s kind of complicated to decide whether to just wait for a failing node or consider it failed and move on. I think it’s interesting to see how MySQL/MariaDB does it with semi-sync replication. The master will wait up to some configurable number of milliseconds for a replica to respond during semi-sync mode, and when the timeout is exceeded the master will leave the replica behind and continue in async mode. That way you’re not stuck with a failed replica, but are synchronous during “happy” times. The Rig just crashes the program right now when it can’t make progress with the replica =P. I’m working on it. Other stuff I was looking at: DRBD internals: Some two-phase commit stuff in DRBD: FreeBSD HAST:
  3. In this post, we’ll use a blog poll to find out what operating system you use to run your development database servers. In our last blog poll, we looked at what OS you use for your production database. Now we would like to see what you use for your development database. As databases grow to meet more challenges and expanding application demands, they must try and get the maximum amount of performance out of available resources. How they work with an operating system can affect many variables, and help or hinder performance. The operating system you use for your database can impact consumable choices (such as hardware and memory). The operating system you use can also impact your choice of database engine as well (or vice versa). When new projects, new applications or services or testing new architecture solutions, it makes sense to create a development environment in order to test and run scenarios before they hit production. Do you use the same OS in your development environment as you do your production environment? Please let us know what operating system you use to run your development database. For this blog poll, we’re asking which operating system you use to actually run your development database server (not the base operating system). If you’re running virtualized Linux on Windows, please select Linux as the OS used for development. Pick up to three that apply. Add any thoughts or other options in the comments section: Note: There is a poll embedded within this post, please visit the site to participate in this post's poll. Thanks in advance for your responses – they will help the open source community determine how database environments are being deployed.
  4. For virtually all development teams, testing code is a given: It's one of the most important parts of software development. Whether your organization includes a separate team devoted to QA, or your developers are testing their own code, QA is the primary way your team ensures that your application's logic is working correctly, and it's the best way for you to identify issues as early as possible. As a result, QA is critical for engineering velocity, and it helps shape your users' overall experience when engaging with your product. Nobody likes finding a broken app or website. But what about quality assurance for a database? Do most teams apply the same QA practices to improve their data tier? Do many teams even know how to perform database QA? In this article, we'll talk about how you and your team can apply the same high standards of QA principles to your database, which many organizations often overlook. Image Source General-Practice QA… and Why It Leaves the Database Behind For most tiers of a system, QA is well understood and practiced almost religiously. It's a cornerstone of a product's delivery process, and all responsible teams automatically include it in their workflows. There are sets of steps to follow for good application development QA, and there's a wide variety of tools and processes that support those steps: Unit testing/TDD and continuous integration. Benchmarking and application profiling. End-to-end and system tests, and pre-production environments. Production monitoring and automated health checks. One feature of many of these QA practices is the ability to test code at its most fundamental level—in isolation. In order to achieve this in unit testing, for example, testing tools and environments peel away dependency layers such as databases, so they can see exactly how code logic behaves. That means that many traditional QA unit testing practices are designed to overlook the databases, in favor of a focused examination of the app or code. This, clearly, is a core reason why practices for database QA testing are underdeveloped with many teams. For traditional services, QA is well understood and straightforward; for dependency layers, it offers very little. Just because it's not obvious how teams should QA their database, however, doesn't mean they shouldn't do it at all. Without database QA, you can't understand the following (among many other aspects): How changes affect existing loads. Fluctuations in concurrency and contention. Real data sets. Methods for QA Testing a Database What are the actual methods for doing QA on a database? The simple, tl;dr answer is that queries are the solution to everything. With insightful and detailed query analysis you can find out what's new in your database, measure change, and spot and fix problems early. Similarly, if you can understand how your queries are behaving and compare that behavior to how they should be behaving, you'll have a basis of quality to run QA against. But let's dig deeper. To do the most thorough QA testing of your database, you'll want to perform exercises like running against a test-dataset in a container for unit-testing/CI. This doesn't need to be excessively complicated — a small dataset that you can leverage to exercise new code paths can help provide initial validation that your databases are responding as expected. The simplest thing to look for here is desired output. You can also be more verbose and check query behavior, like running SHOW WARNINGS for MySQL, to ensure there's no unexpected query behavior — the query might return successfully, but a warning might indicate that it'll fail in a subtle way in production. These tests can also validate that the data contained in your database is what you expect it to be.The smaller your dataset, the stricter you can make your tests, but, as a trade off, the tests become more brittle and more likely to fail. In unit testing, developers strive for their testing to strike a balance between flexibility and strong data validation. Additionally, end-to-end testing and pre-production/production monitoring will give you a full picture of how the database behaves and changes in varying contexts. End-to-end testing in particular can confirm that your application and data tier work together as expected. And comparing the results from pre-production/production monitoring lets you identify new queries, detect any regressions in query performance, and quickly identify any errors that you missed in your original end-to-end testing. Finally, be sure to also incorporate regression testing, so you can track how modifications to the database have affected its behavior. Some potential regressions include: An increase in query latency or lock contention. An increase in the number of executions of a query — it's possible to accidentally put a query in a loop and not realize it until you see the query executing more frequently than expected. A change in the execution pattern of a query, which has, for example, caused more clients to issue the same query simultaneously, rather than spreading the query out over time. This might cause a change from a "smooth" execution to a pattern that's "spikier." If you follow normal QA testing standards, you may begin to wonder, "How do I simulate production traffic at scale for large datasets in a controlled environment?" The answer: You don't. Much of a database's function is responsive, and it depends on what the rest of the system is asking the database to do. Trying to simulate a full load—and the full complexity of a full load—is not feasible. This is why pre-production/production comparisons are so important: they can help you develop the perspective you need to understand what your database is actually doing in production versus how it behaves when in more optimal conditions. A Few More Things to Keep in Mind Databases change all the time. In some ways, each execution of a query in production represents a unique test of the database's behavior. From that point of view, database QA has to be a continual live production process. System changes and database versions impact how you expect your database to react in ways. Data sets can grow or shrink; read / write ratios change over time; different bottlenecks may trigger when your system hits various milestones in performance; and, ultimately, your database may look and behave very differently from one month to the next. Additional change comes in many forms. These are a few examples: Code commits can change SQL, though such changes can also come indirectly from ORMs. UI changes can trickle down to database utilization pattern changes. An increase or decrease in the demand for work placed on the database, as  related to site/application traffic. Data no longer fitting into available memory. And finally, anywhere a modern team practices continuous deployment, change becomes a constant factor, and the database's behavior and performance quality require regular attention. Conclusions Although QA testing for a database is inherently different and trickier than QA testing for other layers of a system, it's no less important. There may not be the same set-in-stone principles and tools available for database QA, but if you understand how your queries behave and how they should be behaving, you can make real progress in maintaining your database's quality. Like many aspects of database operations, database QA can seem hard to observe and hard to perform, but once you obtain deeper visibility into how your database functions, many of the best practices are common sense. If you want to learn more about how your data tier may be impacting your organization in unexpected ways, take a look at the free VividCortex ebook The Hidden Costs of Data Engineering, available below.
  5. In this blog post, I’ll talk about multi-threaded slave statistics printed in MySQL error log file. MySQL version 5.6 and later allows you to execute replicated events using parallel threads. This feature is called Multi-Threaded Slave (MTS), and to enable it you need to modify the slave_parallel_workers variable to a value greater than 1. Recently, a few customers asked about the meaning of some new statistics printed in their error log files when they enable MTS. These error messages look similar to the example stated below: [Note] Multi-threaded slave statistics for channel '': seconds elapsed = 123; events assigned = 57345; worker queues filled over overrun level = 0; waited due a Worker queue full = 0; waited due the total size = 0; waited at clock conflicts = 0 waited (count) when Workers occupied = 0 waited when Workers occupied = 0 The MySQL reference manual doesn’t show information about these statistics. I’ve filled a bug report asking Oracle to add information about these statistics in the MySQL documentation. I reported this bug as #85747. Before they update the documentation, we can use the MySQL code to get insight as to the statistics meaning. We can also determine how often these statistics are printed in the error log file. Looking into the file, we find that when you enable MTS – and log-warnings variable is greater than 1 (log-error-verbosity greater than 2 for MySQL 5.7) – the time to print these statistics in MySQL error log is 120 seconds. It is determined by a hard-coded constant number. The code below shows this: /* Statistics go to the error log every # of seconds when --log-warnings > 1 */ const long mts_online_stat_period= 60 * 2; Does this mean that every 120 seconds MTS prints statistics to your MySQL error log (if enabled)? The answer is no. MTS prints statistics in the mentioned period depending on the level of activity of your slave. The following line in MySQL code verifies the level of the slave’s activity to print the statistics: if (rli->is_parallel_exec() && rli->mts_events_assigned % 1024 == 1) From the above code, you need MTS enabled and the modulo operation between the  mts_events_assigned variable and 1024 equal to 1 in order to print the statistics. The mts_events_assigned variable stores the number of events assigned to the parallel queue. If you’re replicating a low level of events, or not replicating at all, MySQL won’t print the statistics in the error log. On the other hand, if you’re replicating a high number of events all the time, and the mts_events_assigned variable increased its value until the remainder from the division between this variable and 1024 is 1, MySQL prints MTS statistics in the error log almost every 120 seconds. You can find the explanation these statistics below (collected from information in the source code): Worker queues filled over overrun level: MTS tends to load balance events between all parallel workers, and the  slave_parallel_workers variable determines the number of workers. This statistic shows the level of saturation that workers are suffering. If a parallel worker queue is close to full, this counter is incremented and the worker replication event is delayed in order to avoid reaching worker queue limits. Waited due to a Worker queue full: This statistic is incremented when the coordinator thread must wait because of the worker queue gets overfull. Waited due to the total size: This statistic shows the number of times that the coordinator thread slept due to reaching the limit of the memory available in the worker queue to hold unapplied events. If this statistic keeps increasing when printed in your MySQL error log, you should resize the slave_pending_jobs_size_max variable to a higher value to avoid the coordinator thread waiting time. Waited at clock conflicts: In the case of possible dependencies between transactions, this statistic shows the wait time corresponding to logical timestamp conflict detection and resolution. Waited (count) when used occupied: A counter of how many times the coordinator saw Workers filled up with “enough” with assignments. The enough definition depends on the scheduler type (per-database or clock-based). Waited when workers occupied: These are statistics to compute coordinator thread waiting time for any worker available, and applies solely to the Commit-clock scheduler. Conclusion Multi-threaded slave is an exciting feature that allows you to replicate events faster, and keep in sync with master instances. By changing the log-warnings variable to a value greater than 1, you can get information from the slave error log file about how multi-threaded performance.